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Preface 


Read This First 


About This Manual 


This user’s guide serves as a reference book for the TMS320C3x generation 
of digital signal processors, which includes the TMS320C30, TMS320C31, 
TMS320LC31 and TMS320C32. Throughout the book, all references to ’C3x 
refer collectively to ’C30, ’C31, and ’C32 and the TMS320C30, TMS320C31, 
and TMS320C32 refer to all speed variations unless an exception is noted. 
This document provides information to assist managers and hardware/soft¬ 
ware engineers in application development. 

Specifically, this book complements the TMS320C3x User’s Guide by provid¬ 
ing information to assist you in application development. It includes example 
code and hardware connections for various appliances. 

This guide presents examples of frequently used applications and discusses 
more involved examples and applications. It also defines the principles in¬ 
volved in many applications and gives the corresponding assembly language 
code for instructional purposes and for immediate use. Whenever a detailed 
explanation of the underlying theory is too extensive to be included in this 
manual, appropriate references are given for further information. 













Notational Conventions 


Notational Conventions 

This document uses the following conventions: 

□ Program listings, program examples, and interactive displays are shown 
in a special typeface that is similar to that of a typewriter. Examples 
use a bold version of the special typeface for emphasis. Interactive 
displays use a bold version of the special typeface to distinguish com¬ 
mands that you enter from items that the system displays (such as 
prompts, command output, error messages, etc.). 

The following is a sample program listing: 


0011 

0005 

0001 

.field 

1 , 

2 

0012 

0005 

0003 

.field 

3, 

4 

0013 

0005 

0006 

.field 

6, 

3 

0014 

0006 


. even 




The following is an example of a system prompt and a command you might 
enter: 

C: csr -a /user/ti/simuboard/utilities 

□ Any string within angle brackets is considered to be a variable. In syntax 
descriptions, the variable is written in a typeface similar to that of the text. 
The following is an example of a variable syntax: 

<file name> Path name of a UNIX file 
<signal> Name of a signal 

□ In syntax descriptions, the instruction, command, or directive is in a bold 
typeface font and parameters are in an italic typeface. Portions of a syntax 
that are in bold should be entered as shown below. Portions of a syntax 
that are in /Ya//cs describe the type of information that should be entered. 
The following is an example of a directive syntax: 


.asect ’’section name”, address 


In the preceding example, “.asect” is the directive. This directive has two 
parameters, indicated by section name and address. When you use 
“.asect,” the first parameter must be an actual section name, enclosed in 
double quotes; the second parameter must be an address. 

□ Square brackets ([ and ]) identify an optional parameter. If you use an 
optional parameter, you must specify the information within the brackets; 
you must not enter the brackets themselves. The following is an example 
of an instruction that has an optional parameter: 


LALK 16-bit constant [, shift] 
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Notational Conventions 


The LALK instruction has two parameters. The first parameter, 16-bit con¬ 
stant, is required. The second parameter, shift, is optional. As this syntax 
shows, if you use the optional second parameter, you must precede it with 
a comma. 

Square brackets are also used as part of the pathname specification for 
VMS pathnames. In this case, the brackets are actually part of the path¬ 
name (they are not optional). 

□ In assembler syntax statements, column 1 is reserved for the first char¬ 
acter of a label or symbol. If the label or symbol is optional, it is usually not 
shown. If it is a required parameter, it is shown starting against the left 
margin of the shaded box, as in the example below. No instruction, com¬ 
mand, directive, or parameter, other than a symbol or label, can begin in 
column 1. 


symbol .usect "section name", size in bytes [, alignment] 


The symbol\s required for the .usect directive and must begin in column 1. 
The section name must be enclosed in quotes and the parameter size in 
bytes must be separated from the section name by a comma. The align¬ 
ment \s optional and, if used, must be separated by a comma. 

□ Braces ( {and} ) indicate a list. The symbol | (read as of) separates items 
within the list. The following is an example of a list: 

{*!*+!*_} 

This provides three choices: *, *+, or 

Unless the list is enclosed in square brackets, you must choose one item 
from the list. 

□ Some directives can have a varying number of parameters. For example, 
the .byte directive can have up to 100 parameters. The syntax for this 
directive is: 


.byte value-i valuon] 


Note that .byte does not 
in column one. 



This syntax shows that .byte must have at least one value parameter, but 
you have the option of supplying additional value parameters, each sepa¬ 
rated from the previous one by a comma. 
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Information About Cautions and Warnings 


Information About Cautions and Warnings 

This book may contain cautions and warnings. 


This is an exampie of a caution statement. 

A caution statement describes a situation that couid potentiaiiy 
damage your software or equipment. 


This is an exampie of a warning statement. 

A warning statement describes a situation that could potentially 
cause harm to you. 


The information in a caution or a warning is provided for your protection. 
Please read each caution and warning carefully. 
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Related Documentation From Texas Instruments 


Related Documentation From Texas Instruments 

The following books describe the TMS320 floating-point devices and related 

support tools. To obtain a copy of any of these Tl documents, call the Texas 

Instruments Literature Response Center at (800) 477-8924. When ordering, 

please identify the book by its title and literature number. 

JTAG/MPSD Emulation Technical Reference (literature number SPDU079) 
provides the design requirements of the XDS510™ emulator controller, 
discusses JTAG designs (based on the IEEE 1149.1 standard), and 
modular port scan device (MPSD) designs. 

Setting Up TMS320 DSP Interrupts in C Application Report (literature 
number SPRA036) describes methods of setting up interrupts for the 
TMS320 family of processors in C programming language. Sample code 
segments are provided, along with complete examples of how to set up 
interrupt vectors. 

TLC32040C, TLC32040I, TLC32041C, TLC32041I Analog Interface 
Circuits 

(literature number SLAS014E) data sheet contains the electrical and 
timing specifications for these devices, as well as signal descriptions and 
pinouts for all of the available packages. 

TMS320C3x/C4x Assembly Language Tools User’s Guide (literature num¬ 
ber SPRU035) describes the assembly language tools (assembler, link¬ 
er, and othertools used to develop assembly language code), assembler 
directives, macros, common object file format, and symbolic debugging 
directives for the ’C3x and ’C4x generations of devices. 

TMS320C3x/C4x Code Generation Tools Getting Started Guide (literature 
number SPRU119) describes how to install the TMS320C3x/C4x 
assembly language tools and the C compiler. Installation instructions are 
included for MS-DOS™, Windows 3.x, Windows NT, Windows 95, 
SunOS™, Solaris, and HP-UX™ systems. 

TMS320C3x/C4x Optimizing C Compiier User’s Guide (literature number 
SPRU034) describes the TMS320 floating-point C compiler. This C com¬ 
piler accepts ANSI standard C source code and produces TMS320 as¬ 
sembly language source code for the ’C3x and ’C4x generations of de¬ 
vices. 

TMS320C3X C Source Debugger (literature number SPRU053) describes 
the ’C3x debugger for the emulator, evaluation module, and simulator. 
This book discusses various aspects of the debugger interface, including 
window management, command entry, code execution, data manage¬ 
ment, and breakpoints. It also includes a tutorial that introduces basic de¬ 
bugger functionality. 


Read This First 
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Related Documentation From Texas Instruments 


TMS320C3x/C4x Assembly Language Tools User’s Guide (literature 
number SPRU035) describes the assembly language tools (assembler, 
linker, and other tools used to develop assembly language code), 
assembler directives, macros, common object file format, and symbolic 
debugging directives for the ’C3x and ’C4x generations of devices. 

TMS320C3X User’s Guide (literature number SPRU031) describes the ’C3x 
32-bit floating-point microprocessor (developed for digital signal proces¬ 
sing as well as general applications), its architecture, internal register 
structure, instruction set, pipeline, specifications, and DMA and serial 
port operation. Software and hardware applications are included. 

TMS320C3x/C4x Code Generation Tools Getting Started Guide (literature 
number SPRU119) describes how to install the TMS320C3x/C4x 
assembly language tools and the C compiler. Installation instructions are 
included for MS-DOS™, Windows 3.x, Windows NT, Windows 95, 
SunOS™, Solaris, and HP-UX™ systems. 

TMS320C30 Digital Signal Processor (literature number SPRS032A) data 
sheet contains the electrical and timing specifications for this device, as 
well as signal descriptions and pinouts for all of the available packages. 

TMS320C31, TMS320LC31 Digital Signal Processors (literature number 
SPRS035) data sheet contains the electrical and timing specifications for 
these devices, as well as signal descriptions and pinouts for all of the 
available packages. 

TMS320C32 Digital Signal Processor (literature number SPRS027C) data 
sheet contains the electrical and timing specifications for this device, as 
well as signal descriptions and pinouts for all of the available packages. 

TMS320 DSP Development Support Reference Guide (literature number 
SPRU011) describes the TMS320 family of digital signal processors and 
the tools that support these devices. Included are code-generation tools 
(compilers, assemblers, linkers, etc.) and system integration and debug 
tools (simulators, emulators, evaluation modules, etc.). Also covered are 
available documentation, seminars, the university program, and factory 
repair and exchange. 

TMS320Family Development Support Reference Guide {WXeraXure number 
SPRU011E) describes the TMS320 family of digital signal processors 
and the various products that support it. This includes code-generation 
tools (compilers, assemblers, linkers, etc.) and system integration and 
debug tools (simulators, emulators, evaluation modules, etc.). This book 
also lists related documentation, outlines seminars and the university 
program, and provides factory repair and exchange information. 
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Related Documentation from Texas Instruments / References 


TMS320 Third-Party Support Reference Guide (literature number 
SPRU052C) alphabetically lists over 100 third parties who supply vari¬ 
ous products that serve the family of TMS320 digital signal processors, 
including software and hardware development tools, speech recogni¬ 
tion, image processing, noise cancellation, modems, etc. 


References 


The publications in the following reference list contain useful information re¬ 
garding functions, operations, and applications of digital signal processing 
(DSP). These books also provide other references to many useful technical 
papers. The reference list is organized into categories of general DSP, speech, 
image processing, and digital control theory and is alphabetized by author. 

□ General Digital Signal Processing: 

Antoniou, Andreas, Digital Filters: Analysis and Design. New York, NY: 
McGraw-Hill Company, Inc., 1979. 

Bateman, A., and Yates, W., Digital Signal Processing Design. Salt Lake 
City, Utah: W. H. Freeman and Company, 1990. 

Brigham, E. Oran, The Fast Fourier Transform. Englewood Cliffs, NJ: 
Prentice-Hall, Inc., 1974. 

Burrus, C.S., and Parks, T.W., DFT/FFT and Convolution Algorithms. New 
York, NY: John Wiley and Sons, Inc., 1984. 

Chassaing, R., and Horning, D., Digital Signal Processing with the 
TMS320C25. New York, NY: John Wiley and Sons, Inc., 1990. 

Digital Signal Processing Applications with the TMS320 Family, Vol. I. 
Texas Instruments, 1986; Prentice-Hall, Inc., 1987. 

Digital Signal Processing Applications with the TMS320 Family, Vol. II. 
Texas Instruments, 1990; Prentice-Hall, Inc., 1990. 

Digital Signal Processing Applications with the TMS320 Family, Vol. III. 
Texas Instruments, 1990; Prentice-Hall, Inc., 1990. 

Gold, Bernard, and Rader, C.M., Digital Processing of Signals. New York, 
NY: McGraw-Hill Company, Inc., 1969. 

Hamming, R.W., Digital Filters. Englewood Cliffs, NJ: Prentice-Hall, Inc., 
1977. 

Hutchins, B., and Parks, T, A Digital Signal Processing Laboratory Using 
the TMS320C25. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1990. 

IEEE ASSP DSP Committee (Editor), Programs for Digital Signal 
Processing. New York, NY: IEEE Press, 1979. 
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□ Speech: 
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Springer-Verlag, 1976. 

Jayant, N.S., and Noll, Peter, Digital Coding of Waveforms. Englewood 
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Microsoft Corporation. 
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HPGL is registered trademark of Hewlett Packard Company. 

Macintosh and MPW are trademarks of Apple Computer Corp. 
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If You Need Assistance 


If You Need Assistance... 


□ World-Wide Web Sites 

Tl Online 

Semiconductor Product Information Center (PIC) 

DSP Solutions 

320 Hotline On-line™ 


http://www.ti.com 

h tt p ://ww w. t i. CO m/sc/d ocs/p i c/h 0 m e. h t m 

http://www.ti.com/dsps 

http://www.ti.com/sc/docs/dsps/support.htm 


□ North America, South America, Central America 

Product Information Center (PIC) (972) 644-5580 

Tl Literature Response Center U.S.A. (800) 477-8924 

Software Registration/Upgrades (214)638-0333 Fax: (214)638-7742 

U.S.A. Factory Repair/Hardware Upgrades (281) 274-2285 

U.S. Technical Training Organization (972) 644-5580 

DSP Hotline (281)274-2320 Fax: (281)274-2324 

DSP Modem BBS (281) 274-2323 

DSP Internet BBS via anonymous ftp to ftp://ftp.ti.com/pub/tms320bbs 


Email: dsph(5)ti.com 


□ Europe, Middle East, Africa 

European Product Information Center (EPIC) Hotlines: 
Multi-Language Support 
Deutsch -^49 8161 80 33 11 

English 
Francais 
Italiano 

EPIC Modem BBS 
European Factory Repair 
Europe Customer Training Helpline 


-f33 1 30 70 11 69 
or+33 1 30 70 11 68 
-f33 1 30 70 11 65 
-f33 1 30 70 11 64 
-f33 1 30 70 11 67 
-h33 1 30 70 11 99 
-f33 4 93 22 25 40 


Fax: -1-33 1 30 70 10 32 Email: epic(5)ti.com 


Fax: -^49 81 61 80 40 10 


□ Asia-Pacific 


Literature Response Center 
Hong Kong DSP Hotline 
Korea DSP Hotline 
Korea DSP Modem BBS 
Singapore DSP Hotline 
Taiwan DSP Hotline 
Taiwan DSP Modem BBS 


+852 2 956 7288 
+852 2 956 7268 
-h82 2 551 2804 
-H82 2 551 2914 

-H886 2 377 1450 
+888 2 376 2592 


Fax: -f 852 2 956 2200 
Fax: -^852 2 956 1002 
Fax: +82 2 551 2828 

Fax: -H65 390 7179 
Fax: -^886 2 377 2718 


Taiwan DSP Internet BBS via anonymous ftp to ftp://dsp.ee.tit.edu.tw/pub/TI/ 


□ Japan 

Product Information Center -1-0120-81-0026 (in Japan) 

-H03-3457-0972 or (INTL) 813-3457-0972 
DSP Hotline -h 03-3769-8735 or (INTL) 813-3769-8735 

DSP BBS via Nifty-Serve Type “Go TIASP” 


Fax: -1-0120-81-0036 (in Japan) 

Fax: -^03-3457-1259 or (INTL) 813-3457-1259 
Fax: -^03-3457-7071 or (INTL) 813-3457-7071 


□ Documentation 

When making suggestions or reporting errors in documentation, please include the following information that is on the title 
page: the full title of the book, the publication date, and the literature number. 

Mail: Texas Instruments Incorporated Email: dsph(g)ti.com 

Technical Documentation Services, MS 702 
P.O. Box 1443 

Houston, Texas 77251-1443 


Note: When calling a Literature Response Center to order documentation, please specify the literature number of the 
book. 
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Chapter 1 


Processor Initialization 


Before you execute a DSP algorithm, you must initialize the processor. Initializa¬ 
tion brings the processor to a known state. Generally, this occurs anytime after 
the processor is reset. This chapter reviews the concepts of processor initializa¬ 
tion explained in the user’s guide and provides examples. 
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Reset Process 


1.1 Reset Process 

You can reset the processor by applying a low level to the RESET input for at least 
ten H-| cycles. The ’C3x terminates execution and puts the reset vector (the 
contents of memory location 0) in the program counter. The reset vector nor¬ 
mally contains the address of the system-initialization routine. The hardware 
reset also initializes various registers and status bits. 

In order to reset the ’C3x correctly, you need to comply with several hardware 
and software requirements: 

□ If the ’C31 or ’C32 is in microcomputer mode, set the INTx pins (as dis¬ 
cussed in Using the TMS320C31 and TMS320C32 Boot Loaders chapter 
of the TMS320C3X User’s Guide) so that the boot loader works properly. 

□ Provide the correct reset vector value; the reset vector normally contains 
the address of the system initialization routine. 

■ In microcomputer mode, the reset vector is initialized automatically by 
the processor to point to the beginning of the on-chip boot loader code. 
No user action is required. 

■ In microprocessor mode, the reset vector is typically stored in an 
EPROM. Example 1-1 on page 1-5 shows how you can initialize that 
vector. 


□ Apply a low level to the RESET input (see section 1.2). 
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Reset Signal Generation 


1.2 Reset Signal Generation 

The reset input controls the initialization of internal ’C3x logic and also causes 
the execution of the system initialization software. For proper system initializa¬ 
tion, the reset signal must be applied for at least ten H1 cycles, that is, 600 ns 
for a ’C3x operating at 33.33 MHz. Upon power up, however, it can take 20 ms 
or more before the system oscillator reaches a stable operating state. There¬ 
fore, the power-up reset circuit should generate a low pulse on the reset line for 
100 to 200 ms. Once a proper reset pulse has been applied, the processor 
fetches the reset vector from location 0, which contains the address of the system 
initialization routine. Figure 1-1 shows a circuit that generates an appropriate 
power-up reset circuit. 

Figure 1-1. Reset Circuit 
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1.3 How to Initialize the Processor 

After reset, the ’C3x jumps to the address stored in the reset vector location 
and starts execution from that point. The reset vector normally contains the ad¬ 
dress of the system initialization routine. 

The initialization routine typically performs several tasks: 

□ Sets the data-page pointer (DP) register 

□ Sets the stack pointer 

□ Sets the interrupt vector table 

□ Sets the trap vector table 

□ Sets the external memory control register 

□ Clears/enables cache 

I-1 

Note: 

When running under microcomputer mode (MCBL/MP=1), the on-chip boot- 
loader automatically Initializes the external memory-control register values 
from the bootloader table. 

I_I 

The ’C3x can be initialized using assembly language or C. 

1.3.1 Processor Initialization Under Assembly Language 

If you are running under an assembly-only environment, Example 1-1 on 
page 1 -5 provides a basic initialization routine. This example shows code for 
initializing the ’C3x to the following machine state: 

□ All interrupts are enabled. 

□ The overflow mode is disabled. 

□ The program cache is enabled. 

□ The DP register is initialized to 0. 

□ The memory-mapped control registers are initialized. 

□ The internal memory is filled with Os. 
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Example 1-1. TMS320C3x Processor Initialization 


-k 

* TITLE PROCESSOR INITIALIZATION 


-k 


.global 
.global 
.global 
.global 
.global 
.global 
.global 


RESET,INIT,BEGIN 
INTO,INTI,INT2,INT3 
ISRO,ISRl,ISR2,ISR3 
DINT,DMA 

TINTO,TINTl,XINTO,RINTO,XINTl,RINTl 
TIMED,TIMEl,XMTO,RCVO,XMTl,RCVl 
TRAPO,TRAPl,TRAP2,TRPO,TRPl,TRP2 


* PROCESSOR INITIALIZATION EOR THE TMS320C3x 


RESET AND INTERRUPT VECTOR SPECIEICATION. THIS 

* ARRANGEMENT ASSUMES THAT DURING LINKING, THE FOLLOWING 

* TEXT SEGMENT WILL BE PLACED TO START AT MEMORY 

* LOCATION 0. 


-k 


.sect "init" 


RESET 

. word 

INIT 

INTO 

. word 

ISRO 

INTI 

. word 

ISRl 

INT2 

. word 

ISR2 

INT3 

. word 

ISR3 

XINTO 

. word 

XMTO 

RINTO 

. word 

RCVO 

XINTl 

. word 

XMTl 

RINTl 

. word 

RCVl 

TINTO 

. word 

TIMED 

TINTl 

. word 

TIMEl 

DINT 

. word 

DMA 


.space 

20 

TRAPO 

. word 

TRPO 

TRAPl 

. word 

TRPl 

TRAP 2 

. word 

TRP2 


.space 

29 


-k 


Named section 

RS± load address INIT to PC 
INT0+ loads address ISRO to PC 
INTli; loads address ISRl to PC 
INT2± loads address ISR2 to PC 
INT3+ loads address ISR3 to PC 


Serial port 0 transmit interrupt processing 
Serial port 0 receive interrupt processing 
Serial port 1 transmit interrupt processing 
Serial port 1 receive interrupt processing 
Timer 0 interrupt processing 
Timer 1 interrupt processing 
DMA interrupt processing 
Reserved space 

Trap 0 vector processing begins 
Trap 1 vector processing begins 
Trap 2 vector processing begins 
Leave space for the other 29 traps 


* IN THE FOLLOWING SECTION, CONSTANTS THAT CANNOT BE REPRESENTED 

* IN THE SHORT FORMAT ARE INITIALIZED. THE NUMBERS IN PARENTHESES 

* AT THE END OF EACH COMMENT REPRESENT THE OFFSET OF THE 

* REGISTER FROM 808000H (CTRL) 
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Example 1-1. TMS320C3x Processor Initialization (Continued) 



. data 



MASK 

. word 

OFFFFFFFFH 


BLKO 

. word 

0809800H ; 

Beginning address of RAM block 0 

BLKl 

. word 

0809C00H ; 

Beginning address of RAM block 1 

STCK 

. word 

0809F00H ; 

Beginning of stack 

CTRL 

. word 

0808000H ; 

Pointer for peripheral+bus memory map 

DMACTL 

. word 

OOOOOOOH ; 

Init for DMA control (0) 

TIMOCTL 

. word 

OOOOOOOH ; 

Init of timer 0 control (32) 

TIMICTL 

. word 

OOOOOOOH ; 

Init of timer 1 control (48) 

SERGLOBO 

. word 

OOOOOOOH ; 

Init of serial 0 glbl control (64) 

SERPRTXO 

. word 

OOOOOOOH ; 

Init of serial 0 xmt port control (66) 

SERPRTRO 

. word 

OOOOOOOH ; 

Init of serial 0 rev port control (67) 

SERTIMO 

. word 

OOOOOOOH ; 

Init of serial 0 timer control (68) 

SERGLOBl 

. word 

OOOOOOOH ; 

Init of serial 1 glbl control (80) 

SERPRTXl 

. word 

OOOOOOOH ; 

Init of serial 1 xmt port control (82) 

SERPRTRl 

. word 

OOOOOOOH ; 

Init of serial 1 rev port control (83) 

SERTIMl 

. word 

OOOOOOOH ; 

Init of serial 1 timer control (84) 

PARINT 

. word 

OOOOOOOH ; 

Init of parallel interface control (100) 

lOINT 

-k 

. word 

OOOOOOOH ; 

Init of I/O interface control (96) 


. text 



-k 

THE ADDRESS 

AT MEMORY LOCATION 0 DIRECTS EXECUTION TO BEGIN HERE 

* FOR RESET PROCESSING THAT INITIALIZES THE PROCESSOR. WHEN RESET 

* IS APPLIED, 

* 

THE FOLLOWING REGISTERS ARE INITIALIZED TO 0: 

* ST- 

CPU STATUS REGISTER 

* IE- 

CPU/DMA INTERRUPT 

ENABLE FLAGS 

* IF- 

CPU INTERRUPT FLAGS 

* lOF- 

■k 

I/O FLAGS 


* THE STATUS REGISTER HAS 

THE FOLLOWING ARRANGEMENT: 

* BITS: 

31 

-14 13 12 11 

10 9 8 7 6543210 

FUNCTION: RESRV GIE CC CE 

; CF RESRV RM OVM LUF LV UF N Z V C 

INIT LDP 

0, 

DP ; Point the DP register to page 0 

LDI 

1800H,ST ; Clear and enable cache, and disable OVM 

LDI 

@MASK,IE ; Unmask all interrupts 

INTERNAL DATA MEMORY INITIALIZATION TO FLOATING POINT 0 

■k 

LDI 

@BLK0, 

. ARO ; 

ARO points to block 0 

LDI 

@BLK1, 

r ARl ; 

ARl points to block 1 

LDF 

0.0,RO 

0 register RO 

RPTS 

1023 

r 

Repeat 1024 times ... 

STF 

R0,*AR0++(1) ; 

Zero out location in RAM block 0 and ... 

I I STF 

R0,*AR1++(1) ; 

Zero out location in RAM block 1 
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Example 1-1. TMS320C3x Processor Initialization (Continued) 


•k 

THE PROCESSOR IS INITIALIZED. THE REMAINING APPLICATION- 

■k 

DEPENDENT PART OF THE 

SYSTEM (BOTH ON- AND OFF-CHIP) SHOULD 


NOW BE 

INITIALIZED. 





FIRST, 

INITIALIZE THE 

CONTROL REGISTERS. IN THIS EXAMPLE, 

■k 

EVERYTHING IS INITIALIZED 

TO 0, 

SINCE THE ACTUAL INITIALIZATION IS 

■k 

APPLICATION-DEPENDENT 





LDI 

@CTRL,ARO 

} 

Load 

in ARO the pointer to control 

■k 



r 

registers 


LDI 

@DMACTL,RO 





STI 

RO,*+AR0(0) 

r 

Init 

DMA control 


LDI 

@TIM0CTL,R0 





STI 

RO,^+AR0(32) 

} 

Init 

timer 0 control 


LDI 

@TIM1CTL,R0 





STI 

RO,*+AR0(48) 

} 

Init 

timer 1 control 


LDI 

@SERGLOBO,RO 





STI 

RO,*+AR0(64) 

} 

Init 

serial 0 global control 


LDI 

@SERPRTX0,R0 





STI 

RO,*+AR0(66) 

} 

Init 

serial 0 xmt control 


LDI 

OSERPRTRO,RO 





STI 

RO,*+AR0(67) 

r 

Init 

serial 0 rev control 


LDI 

@SERTIM0,R0 





STI 

RO,*+AR0(68) 

r 

Init 

serial 0 timer control 


LDI 

OSERGLOBl,RO 





STI 

RO,*+AR0(80) 

r 

Init 

serial 1 global control 


LDI 

OSERPRTXl,RO 





STI 

RO,*+ARO(82) 

r 

Init 

serial 1 xmt control 


LDI 

OSERPRTRl,RO 





STI 

RO,*+ARO(83) 

r 

Init 

serial 1 rev control 


LDI 

OSERTIMl,RO 





STI 

RO,^+ARO(84) 

r 

Init 

serial 1 timer control 


LDI 

OPARINT,RO 





STI 

RO,*+ARO(100) 

} 

Init 

parallel interface 




} 

control (C30 only) 


LDI 

@IOINT,RO 




-k 

STI 

RO,*+ARO(96) 

r 

Init 

I/O interface control 


LDI 

@STCK,SP 

r 

Init 

the stack pointer 

-k 

OR 

2000H,ST 

} 

Global interrupt enable 


BR 

BEGIN 

} 

Branch to the beginning of application 


. end 
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1.3.2 Processor Initialization Under C Language 

If you are running under a C environment, your initialization routine is typically 
boot.asm (from the RTS30.LIB library that comes with the floating-point com¬ 
piler). In addition to initializing global variables, boot.asm initializes the DP reg¬ 
ister (pointing to the .bss section) and the stack pointer (SP) register (pointing 
to the .stack section). You must enable the cache, as shown in Example 1-2, 
and set up your interrupts inside your main routine before you enable inter¬ 
rupts. See the application report. Setting Up TMS320 DSP Interrupts in C, for 
more information. 

Example 1-2. Enabling the Cache 


main () 

{ 

asm(" or 





1800,St") 


; enable 

cache 

/* asm(" 

} 

or 3800,St") 

■k / 

; enable 

cache and interrupts 
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1.4 Low-Power Mode Interrupt 

This section explains how to generate interrupts when the IDLE2 power-down 
mode is used. 

The execution of the IDLE2 instruction causes the H1 and H3 processor clocks 
to be held at a constant level until the occurrence of an external Interrupt. To 
use the IDLE2 power management feature effectively, Interrupts must be gen¬ 
erated with or without the presence of the H1 clock. For normal (non-IDLE2) 
operation, however, the interrupt inputs must be synchronized with the falling 
edge of the H1 clock. An interrupt must satisfy the following conditions: 

□ It must meet the setup time on the falling edge of H1. 

□ It must be at least one cycle and less than two cycles in duration. 

For an interrupt to be recognized during IDLE2 operation and to turn the clocks 
back on. It must first be held low for one H1 cycle. The logic in Figure 1-2 can 
be used to generate an interrupt signal to the ’C3x with the correct timing dur¬ 
ing non-IDLE2 and IDLE2 operation. Figure 1-2 shows the interrupt circuit, 
which uses a 16R4 programmable logic device (PLD) to generate the ap¬ 
propriate interrupt signal. 

Figure 1-2. Interrupt Generation Circuit for Use With IDLE2 Operation 



Example 1-3 shows the PLD equations for the 16R4 using the ABEL™ lan¬ 
guage. This implementation makes the following assumptions regarding the 
interrupt source: 

□ The interrupt source is a low-going pulse or a falling edge. If the interrupt 
source stays active for more than one HI cycle, it is regarded as the same 
interrupt request and not a new one. 

□ The interrupt source is at least one HI cycle in duration. One HI cycle is 
required to turn the HI clock on again. 
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The interrupt is driven active as soon as the interrupt source goes active. It 
goes inactive again on detection of two H3 rising edges. These two rising 
edges ensure that the interrupt is recognized during normal operation and af¬ 
ter the end of IDLE2 operation (when the clocks turn on again). The interrupt 
goes inactive after the two H3 clocks are counted and does not go inactive 
again until after the interrupt source again goes inactive and returns to active. 

Example 1-3. State Machine and Equations for the Interrupt Generation 16R4 PLD 


MODULE INTERRUPT_GENERATION 

TITLE' INTERRUPT_GENERATION FOR IDLE2 AND NON-IDLE2 TMS320C31A 
TMS320C31' 

c3xu5 device 'P16R4'; 

"inputs 
h3 Pin 1; 

intsrc_Pin 2; "Interrupt source 
"output 

intx_ Pin 12; "Interrupt input signal to the TMS320C31 

sync_src_Pin 14; "Internal signal used to synchronize the 
"input to the HI clock 

same_ Pin 15; "Keeps track if the new interrupt source 
"has occurred. If active, no new interrupt 
"has occurred. 

"This logic makes the following assumptions: 

"The duration of the interrupt source is at least one HI 
"cycle in duration. It takes one HI cycle to turn the HI 
"clock on again. 

"The interrupt source is pulse- or level-triggered. If the 
"source stays active after being asserted, it is regarded 
"as the same interrupt request and not a new one. 


"Name Substitutions for Test Vectors and Equations 

c,H,L,X = .C.,,1,0,.X.; 

source = !intsrc_; 
sync = !sync_src_; 
samesrc= !same_; 
c3xint = !intx_; 

"state bits 

outstate = [samesrc,sync] ; 
idle = ^bOO; 

sync_st= ^bOl;"synchronize state 

wait = ^bl0;"wait for interrupt source to go inactive 


state_diagram outstate* 
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Example 1-3. State Machine and Equations for the Interrupt Generation 16R4 PLD 
(Continued) 


state idle: 

if (source) then sync_st 
else idle; 


state sync_st: 

if (source) then wait 
else idle; 


state wait: 

if (source) then wait 
else idle; 


equations 

!intx_ = (source # sync) & Isamesrc; 

@page 


"Test interrupt generation logic 
test_vectors 


( [he. 

source 

] - 

> [outstate,c3xint]) 


[ c. 

L ] 

-> 

[idle. 

L ] 

; "check start from idle 

[ L, 

H ] 

-> 

[idle. 

H ] 

; "test 

normal interrupt operation 

[ c. 

H ] 

-> 

[sync_st, 

H ] 

r 


[ c. 

L ] 

-> 

[idle. 

L ] 

r 


[ c. 

L ] 

-> 

[idle. 

L ] 

} 


[ L, 

H ] 

-> 

[idle. 

H ] 

; "test 

coming out of idle2 operation 

[ L, 

H ] 

-> 

[idle. 

H ] 

r 


[ c. 

H ] 

-> 

[sync_st, 

H ] 

r 


[ c. 

L ] 

-> 

[idle. 

L ] 

} 


[ c. 

H ] 

-> 

[sync_st, 

H ] 

; "test 

same source 

[ c. 

H ] 

-> 

[wait, 

L ] 

r 


[ c. 

H ] 

-> 

[wait, 

L ] 

r 


[ C, 

L ] 

-> 

[idle. 

L ] 

r 


[ L, 

H ] 

-> 

[idle. 

H ] 

; "test 

idle2 operation 

[ L, 

H ] 

-> 

[ idle. 

H ] 

} 


[ L, 

H ] 

-> 

[idle. 

H ] 

r 



end interrupt_generation 
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Chapter 2 


Program Control 


This chapter discusses a group of ’C3x instructions that provide program control 
and facilitate all types of high-speed processing. These instructions handle: 

□ Regular calls 

□ Software stack 

□ Interrupts 

□ Delayed branches 

□ Single- and multiple-instruction loops without any overhead 
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2.4 Context Switching in Interrupts and Subroutines. 2-11 
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2.7 Computed GOTOs . 2-22 
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2.1 Subroutines 


The ’C3x has a 24-bit program counter (PC) and a practically unlimited soft¬ 
ware stack. The CALL and CALLcond instructions cause the stack pointer to 
increment and store the contents of the next value of the program counter on 
the stack. At the end of the subroutine, the RETScondinstruction performs a 
conditional return. 

Example 2-1 illustrates how to use a subroutine to determine the dot product 
between two vectors. Given two vectors of length N, represented by the arrays 
a [0], a [1],..., a [N -1] and b [0], b [1],..., b [N -1], the dot product is computed 
from the expression 

d = a [0] b [0] + a [1] b [1] + ... + a [N -1] b [N -1] 

Processing proceeds in the main routine to the point at which the dot product 
is to be computed. It is assumed that the arguments of the subroutine have been 
appropriately initialized. At this point, a CALL is made to the subroutine, transfer¬ 
ring control to that section of the program memory for execution, then returning 
to the calling routine through the RETS instruction when execution has com¬ 
pleted. For Example 2-1, it would suffice to save only register R2. However, 
many registers are saved for demonstration purposes. The saved registers are 
stored on the system stack. This stack must be large enough to accommodate 
the maximum anticipated storage requirements. You can use other methods of 
saving registers, also. 
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Example 2- 


-k 

■k 

■k 

■k 

■k 

■k 

■k 

■k 

■k 

* 

* 

* 

■k 

■k 

* 

■k 

* 

■k 

* 

■k 

:k 

■k 

* 

■k 

* 

■k 

* 

■k 

:k 

■k 

* 

■k 

■k 

DOT 


1. Subroutine Call (Dot Product) 


TITLE SUBROUTINE CALL (DOT PRODUCT) 

MAIN ROUTINE THAT CALLS THE SUBROUTINE 'DOT' TO COMPUTE THE 
DOT PRODUCT OF TWO VECTORS 


LDI 

@blkO,ARO 

; ARO 

points to vector a 

LDI 

@blkl,ARl 

; ARl 

points to vector b 

LDI 

N,RC 

; RC 

contains the number of elements 

CALL 

DOT 




SUBROUTINE DOT 


EQUATION: d=a(0) *b(0) +a(l) *b(l) + ... + a(N±l) * b(N±l) 

THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO. N MUST 
BE GREATER THAN OR EQUAL TO 2. 


ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 


ARO I ADDRESS OF a(0) 

ARl I ADDRESS OF b(0) 

RC I LENGTH OF VECTORS (N) 

REGISTERS USED AS INPUT: ARO, ARl, RC 

REGISTER MODIFIED: RO 
REGISTER CONTAINING RESULT: RO 


.global DOT 



PUSH 

ST ; 

Save status register 


PUSH 

R2 ; 

Use the stack to save 

R2' s 

PUSHF 

R2 ; 

Lower 32 and upper 32 

bits 

PUSH 

ARO ; 

Save ARO 


PUSH 

ARl ; 

Save ARl 


PUSH 

RC ; 

Save RC 
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Example 2-1. Subroutine Call (Dot Product) (Continued) 


■k 



; Initialize RO: 


MPYF3 

*AR0,*AR1,R0 

; a{0) * b { 0 ) ±> RO 


LDF 

o 

o 

; Initialize R2 


SUBI 

2,RC 

; Set RC = N±2 

■k 

DOT PRODUCT (1 <= i < N) 



RPTS 

RC 

; Setup the repeat single 


MPYF3 

*++AR0(1),*++ARl ( 1 ) ,RO 

; a(i) * b(i) ±> RO 

I I 

■k 

ADDF3 

R0,R2,R2 

; a (itl) *b {i±l) + R2 ±> R2 

* 

ADDF3 

o 

O 

; a (N±l) *b (N±l) + R2 ±> RO 

■k 

* 

RETURN SEQUENCE 



POP 

RC 

; Restore RC 


POP 

ARl 

; Restore ARl 


POP 

ARO 

; Restore ARO 


POPF 

R2 

; Restore top 32 bits of R2 


POP R2 


; Restore bottom 32 bits of R2 


POP ST 


; Restore ST 


RETS 


; Return 

■k 

end 



■k 

. end 
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2.2 Stacks and Queues 

The ’C3x provides a dedicated stack pointer (SP) register for building stacks 
in memory. Also, the auxiliary registers can be used to build user stacks and 
a variety of more general linear lists. This section discusses the implementa¬ 
tion of the following types of linear lists: 

Stack A linear list for which all insertions and deletions are made 

at one end of the list 

Queue A linear list for which all insertions are made at one end of 

the list, and all deletions are made at the other end. 

Dequeue A double-ended queue for which insertions and deletions 

are made at either end of the list. 


2.2.1 System Stacks 

A stack in the ’C3x fills from a low-memory address to a high-memory address, 
as shown in Figure 2-1. A system stack stores addresses and data during sub¬ 
routine calls, traps, and interrupts. 

Figure 2-1. System Stack Configuration 



High memory 


The stack pointer is a 32-bit register that contains the address of the top of the 
system stack. The SP always points to the last element pushed onto the stack. 
A push performs a preincrement, and a pop performs a postdecrement of the 
SP. Make provisions to accommodate your software’s anticipated storage re¬ 
quirements. 

The stack pointer can be read from as well as written to; multiple stacks can 
be created by updating the SP. The SP is not initialized by the hardware during 
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reset; it is important to remember to initialize its value so that it points to a pre¬ 
determined memory location. Example 1-1 on page 1-5 shows howto initial¬ 
ize the SR You must initialize the stack to a valid free memory space. Other¬ 
wise, use of the stack can corrupt data or program memory. 

The program counter is pushed onto the system stack on subroutine calls, 
traps, and interrupts. It is popped from the system stack on returns. The PUSH, 
POP, PUSHF, and POPF instructions push and pop the system stack. The 
stack can be used inside subroutines for temporary storage of registers, as in 
Example 2-1 on page 2-3. 

Two instructions, PUSHF and POPF, are for floating-point numbers. These 
instructions can pop and push floating-point numbers to registers R0-R7. This 
feature is very useful for saving the extended-precision registers (see 
Example 2-1 and Example 2-2). PUSH saves the lower 32 bits of an 
extended-precision register, and PUSHF saves the upper 32 bits. To recover 
this extended-precision number, execute a POPF followed by POP. It is 
important to perform the integer and floating-point PUSH and POP in the 
above order, since POPF forces the last eight bits of the extended-precision 
registers to 0. 


2.2.2 User Stacks 


User stacks can be built to store data from low-to-high memory or from high-to- 
low memory. Two cases for each type of stack are shown. You can build stacks 
by using the preincrement/decrement and postincrement/decrement modes 
of modifying the auxiliary registers (AR). 

You can implement stack growth from high to low memory in two ways: 

1 ) Store to memory using *—ARnto push data onto the stack and read from 
memory using *ARn-i-i- to pop data off the stack. 

2 ) Store to memory using *ARn— to push data onto the stack and read from 
memory using *++ARn to pop data off the stack. 

Figure 2-2 illustrates these two cases. The only difference is that in 
Figure 2-2 (a), the AR always points to the top of the stack, and in 
Figure 2-2 (b), the AR always points to the next free location on the stack. 
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Figure 2-2. Implementations of High-to-Low Memory Stacks 


(a) Store to memory using *-ARn and 
read from memory using *ARn++ 

Low memory 


ARn- 


High memory 


(b) Store to memory using *ARn- and 
read from memory using *++ARn 

Low memory 


(Free) 

ARn-^ 

(Free) 

Top of stack 


Top of stack 




Bottom of stack 


Bottom of stack 


High memory 


You can implement stack growth from low to high memory in two ways: 

1) Store to memory using *++ARA7to push data onto the stack and read from 
memory using *ARa7 —to pop data off the stack. 

2 ) Store to memory using *ARa7 -^-^ to push data onto the stack and read from 
memory using *—AHn to pop data off the stack. 

Figure 2-3 illustrates these two cases. In Figure 2-3 (a), the AR always points 
to the top of the stack, and in Figure 2-3 (b), the AR always points to the next 
free location on the stack. 


Figure 2-3. Implementations of Low-to-High Memory Stacks 

(a) Store to memory using *++ARn and (b) Store to memory using *ARn++ and 


read from memory using *ARn 

Low memory 


read from memory using *-ARn 

Low memory 

Bottom of stack 


Bottom of stack 




Top of stack 


Top of stack 

(Free) 

ARn-^ 

(Free) 


High memory High memory 
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2.2.3 Queues and Double-Ended Queues 

The implementation of queues and double-ended queues is based on the ma¬ 
nipulation of the auxiliary registers for user stacks. 

For queues, two auxiliary registers are used: one to mark the front of the queue 
from which data is popped and the other to mark the rear of the queue to where 
data is pushed. 

For double-ended queues, two auxiliary registers are also necessary. One 
register marks one end of the double-ended queue, and the other register 
marks the other end. Data can be popped from or pushed onto either end. 
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2.3 Interrupt Service Routines 

Interrupts on the ’C3x are prioritized and vectored. When an interrupt occurs, 
the corresponding flag is set in the interrupt flag (IF) register. If the correspond¬ 
ing bit in the interrupt enable (IE) register is set and interrupts are enabled by 
having the global interrupt enable (GIE) bit in the status register set to 1, interrupt 
processing begins. You can also write to the IF register, allowing you to force 
an interrupt by software or to clear interrupts without processing them. 

2.3.1 Correct Interrupt Programming 

For interrupts to work properly you must execute the following sequence of 
steps, as shown in Example 1-1: 

1) Create and place an interrupt-vector table in the appropriate memory 
location. 

2) Initialize the ITTP bit field (’C32 only). 

3) Create a software stack. 

4) Enable the specific interrupt. 

5) Enable global interrupts. 

6) Generate the interrupt signal. 

2.3.2 Software Polling of Interrupts 

The interrupt flag register can be polled and action can be taken, depending 
on whether an interrupt has occurred. This is true even when maskable inter¬ 
rupts are disabled. This can be useful when an interrupt-driven interface is not 
implemented. Example 2-2 shows the case in which a subroutine is called 
when external interrupt 1 has not occurred. 

Example 2-2. Use of Interrupts for Software Polling 

* TITLE INTERRUPT POLLING 


TSTB 40H,IF ; Test if interrupt 1 has occurred 

CALLZ SUBROUTINE ; If not, call subroutine 


When interrupt processing begins, the program counter (PC) is pushed onto the 
stack, and the interrupt vector is loaded into the PC. Interrupts are then disabled 
by clearing the GIE bit to 0, and the program continues from the address loaded 
in the PC. Since all interrupts are disabled, interrupt processing can proceed 
without further interruption, unless the interrupt service routine reenables inter¬ 
rupts. 
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Interrupt Service Routines 


2.3.3 Interrupt Priority 

Interrupts on the ’C3x are automatically prioritized. This allows interrupts that 
occur simultaneously to be serviced in a predefined order. Infrequent (but 
lengthy) interrupt service routines (ISRs) might need to be interrupted by more 
frequently occurring interrupts. In Example 2-3, the ISR for INT2 temporarily 
modifies the IE register to permit interrupt processing when an interrupt to 
INTO (but no other interrupt) occurs. When the routine finishes processing, the 
IE register is restored to its original state. The RETIcond instruction not only 
pops the next program counter address from the stack, but also sets the GIE 
bit of the status register. This enables all interrupts that have their interrupt en¬ 
able bit set. 

Example 2-3. Interrupt Service Routine 


* TITLE INTERRUPT SERVICE 

^ .global ISR2 

ENABLE .set 2000h 

MASK .set 1 

ROUTINE 

* INTERRUPT 

PROCESSING 

; EOR EXTERNAL INTERRUPT INT2± 

ISR2 : 

PUSH 

ST 

r 

Save status register 

PUSH 

DP 

} 

Save data page pointer 

PUSH 

IE 

r 

Save interrupt enable register 

PUSH 

RO 

} 

Save lower 32 bits and 

PUSHE 

RO 

r 

upper 32 bits of RO 

PUSH 

R1 

} 

Save lower 32 bits and 

PUSHE 

R1 

r 

upper 32 bits of Rl 

LDI 

MASK,IE 

f 

Unmask only INTO 

OR 

ENABLE,ST 

} 

Enable all interrupts 

* MAIN PROCESSING SECTION 

EOR ISR2 

XOR 

ENABLE,ST 

} 

Disable ail interrupts 

POPE 

R1 

f 

Restore upper 32 bits and 

POP 

R1 

f 

lower 32 bits of Rl 

POPE 

RO 

} 

Restore upper 32 bits and 

POP 

RO 

f 

lower 32 bits of RO 

POP 

IE 

} 

Restore interrupt enable register 

POP 

DP 

r 

Restore data page register 

POP 

7k- 

ST 

} 

Restore status register 

RETI 


r 

Return and enable interrupts 
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Context Switching in Interrupts and Subroutines 


2.4 Context Switching in Interrupts and Subroutines 

Context switching is commonly required during the processing of subroutine 
calls or interrupts. It can be extensive or simple, depending on system require¬ 
ments. On the ’C3x, the program counter is automatically pushed onto the 
stack. Important information in other’C3x registers, such as the status, auxilia¬ 
ry, or extended-precision registers, must be saved by special commands. To 
preserve the state of the status register, push it first and pop it last. This keeps 
the restoration of the extended-precision registers from affecting the status 
register. 

Example 2-4 on page 2-13 and Example 2-5 on page 2-15 show saving and 
restoring the context of the ’C3x. In both examples, the stack expands towards 
higher addresses and is used for saving the registers. If you do not want to use 
the stack pointed at by SP, you can create a separate stack by using an auxilia¬ 
ry register as the stack pointer. Registers saved in these examples are: 

□ Extended-precision registers (R7 through RO) 

□ Auxiliary registers (AR7 through ARO) 

□ Data-page pointer (DP) 

□ Index registers (IRQ and IR1) 

□ Block-size register (BK) 

□ Status register (ST) 

□ Interrupt-related registers (IE and IF) 

□ I/O flag (lOF) 

□ Repeat-related registers (RS, RE, and RC) 

You must preserve only the registers that are modified inside of your subrou¬ 
tine or interrupt/trap service routine and that could potentially affect the pre¬ 
vious context environment. 
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Context Switching in Interrupts and Subroutines 


If the previous context environment was in C, then your program must perform 
one of two tasks: 

□ If the program is in a subroutine, it must preserve the dedicated C registers 
as follows: 


Save as Integers Save as Floating-Point 


R4 

RS 

AR4 

AR5 

AR6 

AR7 

FP 

DP (small model only) 

SP 



□ If the program is in an interrupt sen/ice routine, it must preserve all of the 
’C3x registers (see Example 2-6 on page 2-17). 

If the previous context environment was in assembly language, you must de¬ 
termine which registers to save, based on the operations of your assembly- 
language code. 

I-1 

Note: 

The status register must be saved first and restored last to preserve the proc¬ 
essor status without further change caused by other context-switching in¬ 
structions. 

I_I 
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Context Switching in Interrupts and Subroutines 


Example 2-4. Context Save for the TMS320C3x 


•k 

TITLE CONTEXT SAVE 

FOR 

THE TMS320C3X 




.global SAVE 





■k 

■k 

CONTEXT SAVE ON SUBROUTINE CALL OR INTERRUPT 



SAVE: 






-k 

PUSH 

ST 

} 

Save status register 



:k 

■k 

SAVE THE 

EXTENDED PRECISION REGISTERS 




PUSH 

RO 

} 

Save the lower 32 bits 




PUSHE 

RO 

r 

and the upper 32 bits 

of 

RO 


PUSH 

R1 

} 

Save the lower 32 bits 




PUSHE 

R1 

} 

and the upper 32 bits 

of 

R1 


PUSH 

R2 

f 

Save the lower 32 bits 




PUSHE 

R2 

} 

and the upper 32 bits 

of 

R2 


PUSH 

R3 

f 

Save the lower 32 bits 




PUSHE 

R3 

} 

and the upper 32 bits 

of 

R3 


PUSH 

R4 

f 

Save the lower 32 bits 




PUSHE 

R4 

} 

and the upper 32 bits 

of 

R4 


PUSH 

R5 

f 

Save the lower 32 bits 




PUSHE 

R5 

} 

and the upper 32 bits 

of 

R5 


PUSH 

R6 

f 

Save the lower 32 bits 




PUSHE 

R6 

} 

and the upper 32 bits 

of 

R6 


PUSH 

R7 

f 

Save the lower 32 bits 



* 

PUSHE 

R7 

} 

and the upper 32 bits 

of 

R7 

k: 

SAVE THE 

AUXILIARY 

REGISTERS 



■k 

PUSH 

ARO 

} 

Save ARO 




PUSH 

ARl 

f 

Save ARl 




PUSH 

AR2 

f 

Save AR2 




PUSH 

AR3 

} 

Save AR3 




PUSH 

AR4 

r 

Save AR4 




PUSH 

AR5 

} 

Save AR5 




PUSH 

AR6 

r 

Save AR6 



■k 

PUSH 

AR7 

} 

Save AR7 
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Context Switching in Interrupts and Subroutines 


Example 2-4. Context Save for the TMS320C3x (Continued) 


•k 

-k 

SAVE THE 

REST REGISTERS 

FROM 

THE REGISTER FILE 


PUSH 

DP 

r 

Save 

data page pointer 


PUSH 

IRO 

r 

Save 

index register IRO 


PUSH 

IRl 

} 

Save 

index register IRl 


PUSH 

BK 

r 

Save 

block+size register 


PUSH 

IE 

} 

Save 

interrupt enable register 


PUSH 

IE 

r 

Save 

interrupt flag register 


PUSH 

lOE 

} 

Save 

I/O flag register 


PUSH 

RS 

r 

Save 

repeat start address 


PUSH 

RE 

} 

Save 

repeat end address 

★ 

PUSH 

RC 

r 

Save 

repeat counter 

■k 

:k 

SAVE IS 

COMPLETE 
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Context Switching in Interrupts and Subroutines 


Example 2-5. Context Restore for the TMS320C3x 


•k 

■k 

TITLE CONTEXT RESTORE 

FOR 

THE TMS320C3X 

■k 

:k 

.global RESTR 




■k 

:k 

CONTEXT 

RESTORE AT THE 

; END 

OF A SUBROUTINE CALL OR INTERRUPT 

RESTR: 






RESTORE 

THE REST REGISTERS 

FROM THE REGISTER FILE 


POP 

RC 

r 

Restore 

repeat counter 


POP 

RE 

r 

Restore 

repeat end address 


POP 

RS 

r 

Restore 

repeat start address 


POP 

lOE 

r 

Restore 

I/O flag register 


POP 

IE 

r 

Restore 

interrupt flag register 


POP 

IE 

} 

Restore 

interrupt enable register 


POP 

BK 

r 

Restore 

block±size register 


POP 

IRl 

} 

Restore 

index register IRl 


POP 

IRO 

r 

Restore 

index register IRO 

■k 

POP 

DP 

} 

Restore 

data page pointer 

:k 

■k 

RESTORE 

THE AUXILIARY 

REGISTERS 



POP 

AR7 

} 

Restore 

AR7 


POP 

AR6 

r 

Restore 

AR6 


POP 

AR5 

} 

Restore 

AR5 


POP 

AR4 

r 

Restore 

AR4 


POP 

AR3 

r 

Restore 

AR3 


POP 

AR2 

} 

Restore 

AR2 


POP 

ARl 

} 

Restore 

ARl 

•k 

POP 

ARO 

} 

Restore 

ARO 

■k 

■k 

RESTORE 

THE EXTENDED PRECISION REGISTERS 
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Context Switching in Interrupts and Subroutines 


Example 2-5. Context Restore for the TMS320C3x (Continued) 


POPF 

R7 

; Restore 

the upper 32 

bits and 

POP 

R7 

; the 

lower 32 bits 

of R7 

POPF 

R6 

; Restore 

the upper 32 

bits and 

POP 

R6 

; the 

lower 32 bits 

of R6 

POPF 

R5 

; Restore 

the upper 32 

bits and 

POP 

R5 

; the 

lower 32 bits 

of R5 

POPF 

R4 

; Restore 

the upper 32 

bits and 

POP 

R4 

; the 

lower 32 bits 

of R4 

POPF 

R3 

; Restore 

the upper 32 

bits and 

POP 

R3 

; the 

lower 32 bits 

of R3 

POPF 

R2 

; Restore 

the upper 32 

bits and 

POP 

R2 

; the 

lower 32 bits 

of R2 

POPF 

R1 

; Restore 

the upper 32 

bits and 

POP 

R1 

; the 

lower 32 bits 

of Rl 

POPF 

RO 

; Restore 

the upper 32 

bits and 

POP 

RO 

; the 

lower 32 bits 

of RO 

POP 

•k 

ST 

; Restore 

status register 

* RESTORE 

* 

IS COMPLETE 
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Delayed Branches 


2.5 Delayed Branches 

The ’C3x uses delayed branches to create single-cycle branching. The 
delayed branches operate like regular branches but do not flush the pipeline. 
Instead, the three instructions following a delayed branch are also executed. 
As discussed in the Program Flow Control chapter of the TMS320C3x User’s 
Guide, the only limitations are that none of the three instructions following a 
delayed branch may be a: 

□ Branch (standard or delayed) 

□ Call to a subroutine 

□ Return from a subroutine 

□ Return from an interrupt 

□ Repeat instruction 

□ TRAP instruction 

□ IDLE instruction 

Conditional delayed branches use the conditions that exist at the end of the 
instruction immediately preceding the delayed branch. Sometimes a branch 
is necessary in the flow of a program, but fewer than three instructions can be 
placed after a delayed branch. For faster execution, it is still advantageous to 
use a delayed branch. This is shown in Example 2-6, with no operations per¬ 
formed (NOPs) taking the place of the unused instructions. The trade-off is 
more instruction words for less execution time. 


Example 2-6. Delayed Branch Execution 


TITLE DELAYED BRANCH EXECUTION 


LDE 

*+ARl(5),R2 

; Load contents of memory to R2 


BGED 

SKIP 

; If loaded number >=0, branch (delayed) 


LDEN 

R2,R1 

; If loaded number <0, load it to Rl 


SUBE 

3.0,R1 

; Subtract 3 from Rl 


NOP 


; Dummy operation to complete delayed 




; branch 


MPYE 

1.5,R1 

; Continue here if loaded number <0 

SKIP 

LDE 

Rl, R3 

; Continue here if loaded number >=0 
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Repeat Modes 


2.6 Repeat Modes 

The ’C3x supports looping without any overhead. For that purpose, there are 
two instructions: RPTB, which repeats a block of code, and RPTS, which re¬ 
peats a single instruction. There are three control registers: repeat start-ad- 
dress (RS), repeat end-address (RE), and repeat counter (RC). These contain 
the parameters that specify loop execution. See the Program Flow Control 
chapter in the TMS320C3x User’s Guide tor a complete description of RPTB 
and RPTS. The code automatically sets RS and RF registers RPTB and RPTS 
when instructions are excluded; however, you must set the repeat counter reg¬ 
ister. 


2.6.1 Block Repeat 


Example 2-7 shows an application of the block repeat construct. In this exam¬ 
ple, an array of 64 elements is flipped over by exchanging the elements that 
are equidistant from the end of the array. In other words, the original array is: 

a(1),a(2),...,a(31),a(32),...,a(64) 

The final array after the rearrangement is as follows: 

a(64),a(63),...,a(32),a(31),..., a(1) 

Because the exchange operation is performed on two elements simultaneously, 
it requires 32 operations. The repeat counter register is initialized to 31. In gener¬ 
al, if RC contains the number N, the loop is executed N + 1 times. The loop is 
defined by the RPTB instruction and the EXCH label. 
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Repeat Modes 


Example 2-7. Loop Using Block Repeat 


TITLE LOOP USING BLOCK 

REPEAT 


■k 

THIS CODE 

SEGMENT EXCHANGES THE VALUES OF ARRAY ELEMENTS 

THAT ARE 

■k 

* 

SYMMETRIC 

AROUND THE MIDDLE OF THE ARRAY. 



LDI 

@ADDR,ARO 

; ARO points to the beginning of 

the array 


LDI 

ARO,ARl 




ADD I 

63,ARl 

; ARl points to the end of the 





; 64d:element array 


-k 

LDI 

31,RC 

; Initialize repeat counter 



RPTB 

EXCH 

; Repeat RC+1 times between here 

and 

* 



; EXCH 



LDI 

*AR0,R0 

; Load one memory element in RO, 


1 1 

LDI 

*AR1,R1 

; and the other in Rl 


EXCH STI 

Rl,*AR0++(1) 

; Then, exchange their locations 


1 1 

STI 

RO, *AR1-(1) 




The Program Flow Control chapter in the TMS320C3x User’s Gu/c/e discusses 
restrictions in the block-repeat construct. According to the contents of regis¬ 
ters RS, RE, and RC, the program counter is modified at the end of the loop. 
Therefore, no operation should attempt to modify the repeat counter or the pro¬ 
gram counter at the end of the loop. 

It is possible to nest repeat blocks; however, there is only one set of control 
registers: RS, RE, and RC. It is necessary to save these registers before entering 
an inside loop. You can implement a nested loop by using a register as a count¬ 
er and then using a delayed branch, rather than using the nested repeat block 
approach. 

Example 2-8 shows how to use the block repeat to find a maximum of 147 
numbers. 
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Repeat Modes 


Example 2-8. Use of Block Repeat to Find a Maximum 


•k 

-k 

* TITLE USE OF BLOCK REPEAT TO FIND A MAXIMUM 

* THIS ROUTINE FINDS THE MAXIMUM OF N = 147 NUMBERS. 


-k 

LDI 

LDI 

LD 

146,RC ; 

@ADDR,AR0 ; 

*AR0++(1),R0 ; 

Initialize repeat counter to 147±1 

ARO points to beginning of array 

Initialize MAX to the first value 

LOOP 

RPTB 

CMPF 

LDFLT 

LOOP 

*AR0++(1),R0 ; 

*±AR0(1),R0 ; 

Compare number to the maximum 

If greater, this is a new maximum 


2.6.2 Single-Instruction Repeat 

The single-instruction repeat uses the control registers RS, RE, and RC in the 
same way as the block repeat. The advantage over the block repeat is that the 
instruction is fetched only once, and then the buses are available for moving 
operands. The single-instruction repeat construct is not interruptible; the block 
repeat is interruptible. 

Example 2-9 shows an application of the single-repeat construct. In this ex¬ 
ample, the sum of the products of two arrays is computed. The arrays are not 
necessarily different. If the arrays are a(i) and b(i), each of length N = 512, 
then register RO contains this quantity after computation: 

a (1) b (1) + a (2) b (2) +...+ a (N) b (N) 

The value of the RC is specified to be 511 in the instruction. If RC contains the 
number N, the loop is executed N + 1 times. 
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Repeat Modes 


Example 2-9. Loop Using Single Repeat 


* TITLE LOOP USING SINGLE REPEAT 

* THIS CODE SEGMENT COMPUTES SUM[a(i)b(i)] FOR i = 1 to N. 


LDI 

@ADDR1,ARO 

; ARO points to array a(i 

LDI 

@ADDR2,ARl 

; ARl points to array b(i 

LDF 

o 

o 

o 

; Initialize RO 

MPYF3 

*AR0++(1),^AR1++(1),R1 

; Compute first product 

RPTS 

511 

; Repeat 512 times 

MPYF3 

*AR0++(1),*AR1++(1),R1 

; Compute next product 

ADDF3 

Rl, RO,RO 

; and accumulate the 

; previous one 

ADDF 

Rl, RO 

; One final addition 


Program Control 


2-21 













Computed GOTOs 


2.7 Computed GOTOs 

It is occasionally convenient to select the subroutine to be executed during run 
time (and not during assembly). The ’C3x’s computed GOTO instruction sup¬ 
ports this selection. The computed GOTO is implemented using the CALLcond 
instruction in the register-addressing mode. This instruction uses the contents 
of the register as the address of the call. Example 2-10 shows a computed 
GOTO for a task controller. 

Example 2-10. Computed GOTO 


* TITLE COMPUTED GOTO 

* TASK CONTROLLER 

* THIS MAIN ROUTINE CONTROLS THE ORDER OE TASK EXECUTION (6 TASKS 

* IN THE PRESENT EXAMPLE). TASKO THROUGH TASKS ARE THE NAMES OE 

* SUBROUTINES TO BE CALLED. THEY ARE EXECUTED IN ORDER, TASKO, 

* TASKl, . . .TASKS. WHEN AN INTERRUPT OCCURS, THE INTERRUPT 

* SERVICE ROUTINE IS EXECUTED, AND THE PROCESSOR CONTINUES 

* WITH THE INSTRUCTION EOLLOWING THE IDLE INSTRUCTION. THIS 

* ROUTINE SELECTS THE TASK APPROPRIATE FOR THE CURRENT CYCLE, 

* CALLS THE TASK AS A SUBROUTINE, AND BRANCHES BACK TO THE IDLE 

* TO WAIT FOR THE NEXT SAMPLE INTERRUPT WHEN THE SCHEDULED TASK 

* HAS COMPLETED EXECUTION. RO HOLDS THE OFFSET FROM THE BASE 

* ADDRESS OF THE TASK TO BE EXECUTED. 



LDI 

5,R0 

; Initialize RO 



LDI 

@ADDR,ARl 

; ARl holds base address of the table 

WAIT 

IDLE 


; Wait for the next interrupt 



ADDIS 

*AR1,R0,AR2 

; Add the base address to the 

table 

•k 

SUBI 

1,R0 

; Entry number 

; Decrement RO 



LDILT 

5,R0 

; If R0<0, reinitialize it to 

S 


LDI 

^AR2,R1 

: Load the task address 



CALLU 

R1 

; Execute appropriate task 


-k 

BR 

WAIT 



TSKSEQ 

. word 

TASKS 

; Address of TASKS 



. word 

TASK4 

; Address of TASK4 



. word 

TASKS 

; Address of TASKS 



. word 

TASK2 

; Address of TASK2 



. word 

TASKl 

; Address of TASKl 



. word 

TASKO 

; Address of TASKO 


ADDR 

. word 

TSKSEQ 
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Chapter 3 


Logical and Arithmetic Operations 


This chapter describes the ’C3x instruction set, which supports both integer and 
floating-point arithmetic and logical operations. These instructions can be com¬ 


bined to form more complex operations. 

Topic Page 

3.1 Bit Manipulation .3-2 

3.2 Block Moves.3-4 

3.3 Bit-Reversed Addressing.3-5 

3.4 Integer and Floating-Point Division.3-6 

3.5 Square Root Computation .3-13 

3.6 Extended-Precision Arithmetic.3-16 

3.7 IEEE/TMS320C3X Floating-Point Format Conversion.3-20 
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Bit Manipulation 


3.1 Bit Manipulation 

Instructions for logical operations, such as AND, OR, NOT, ANDN, and XOR, 
can be used with the shift instructions for bit manipulation. A special instruction 
called TSTB tests bits. TSTB performs the same operation as AND, but the 
result of the logical AND is only used to set the condition flags and is not written 
anywhere. Example 3-1 and Example 3-2 demonstrate the use of these in¬ 
structions for bit manipulation and testing. 

Example 3-1. Use of TSTB for Software-Controlled Interrupt 


* TITLE USE OF TSTB FOR SOFTWARE±CONTROLLED INTERRUPT 

* IN THIS EXAMPLE, ALL INTERRUPTS HAVE BEEN DISABLED BY 

* RESETTING THE GIE BIT OE THE STATUS REGISTER. WHEN AN 
INTERRUPT ARRIVES, IT IS STORED IN THE IF REGISTER. THE 

* PRESENT EXAMPLE ACTIVATES THE INTERRUPT SERVICE ROUTINE INTR 

* WHEN IT DETECTS THAT INT2± HAS OCCURRED. 


TSTB 0100b,IF ; Check if bit 2 of IF is set, 

CALLNZ INTR ; and, if so, call subroutine INTR 


3-2 













Bit Manipulation 


Example 3-2. Copy a Bit From One Location to Another 


* TITLE COPY A BIT FROM ONE LOCATION TO ANOTHER 


* BIT I OF R1 NEEDS TO BE COPIED TO BIT J OF R2. 

^ ARO POINTS TO A LOCATION HOLDING I, AND IT IS ASSUMED THAT THE 

* NEXT MEMORY LOCATION HOLDS THE VALUE J. 


-*■ 


-*■ 


i 


R1 


-*■ 

-*■ 


J 

i 


-*■ 


■k 


R2 


-*■ 


-*■ 


I 


^ARO 


^(ARO+1) 


LDI 

1,R0 


LSH 

*AR0,RO 

} 

TSTB 

R1,R0 

f 

BZD 

CONT 

} 

LDI 

1,R0 


LSH 

*+AR0(1),RO 

} 

ANDN 

R0,R2 

r 

OR 

RO, R2 

} 


CONT 


Shift 1 to align it with bit I 
Test the Ith bit of Rl 
If bit = 0, branch delayed 

Align 1 with Jth location 
If bit = 0, reset Jth bit of R2 
If bit = 1, set Jth bit of R2 
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Block Moves 


3.2 Block Moves 


Since the ’C3x addresses a large amount of memory, blocks of data or pro¬ 
gram code can be stored off-chip in slow memories and then loaded on-chip 
for faster execution. Data can also be moved from on-chip to off-chip memory 
for storage or for multiprocessor data transfers. 

You can use direct memory access (DMA) in parallel with CPU operations to 
accomplish such data transfers. The DMA operation is explained in detail in 
Programming the DMA Coprocessor chapter later in the book. An alternative 
to DMA is to perform data transfers under program control using load and store 
instructions in a repeat mode. Example 3-3 shows the transfer of a block of 
512 floating-point numbers from external memory to block 1 of the on-chip 
RAM. 

Example 3-3. Block Move Under Program Control 


* TITLE BLOCK MOVE UNDER 

* 

PROGRAM CONTROL 

extern 

. word 

OlOOOH 


blockl 

. word 

0809C00H 



LDI 

@extern,ARO 

; Source address 


LDI 

@blockl,ARl 

; Destination address 


LDF 

*AR0++, RO 

; Load the first number 


RPTS 

510 

; Repeat following instruction 511 times 


LDF 

’^AR0++, RO 

; Load the next number, and... 

1 1 

STF 

RO,*AR1++ 

; store the previous one 


STF 

RO,*AR1 

; Store the last number 
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Bit-Reversed Addressing 


3.3 Bit-Reversed Addressing 

The ’C3x can implement fast Fourier transforms (FFTs) with bit-reversed 
addressing. If the data to be transformed is in the correct order, the final result 
of the FFT is presented in bit-reversed order. To recover the frequency-domain 
data in the correct order, you must swap certain memory locations. The 
bit-reversed addressing mode makes swapping unnecessary. The next time 
data needs to be accessed, the access is performed in a bit-reversed manner 
rather than sequentially. The base address of bit-reversed addressing must be 
located on a boundary the size of the table. For example, if IRQ = , the n 

least significant bits (LSBs) of the base address must be 0. 

In bit-reversed addressing, IRQ holds a value equal to one half the size of the 
FFT if real and imaginary data are stored in separate arrays. During accessing, 
the auxiliary register is indexed by IRQ, but with reverse carry propagation. 
Example 3-4 illustrates a 512-point complex FFT being moved from the place 
of computation (pointed at by ARO) to a location pointed at by AR1. In this ex¬ 
ample, real and imaginary parts, XR(i) and Xl(i), of the data are not stored in 
separate arrays. They are interleaved as XR(0), Xl(0), XR(1), Xl(1), ..., 
XR(N-I), XI(N-I). Because of this arrangement, the length of the array is 2N 
instead of N, and IRQ is set to 512 instead of 256. 


Example 3-4. Bit-Reversed Addressing 


•k 

-k 

TITLE BIT±REVERSED ADDRESSING 



■k 

THIS EXAMPLE MOVES THE RESULT 

OE 

THE 512±POINT EET 


COMPUTATION POINTED AT BY ARO 

TO 

A LOCATION POINTED AT 

■k 

BY ARl. 

REAL AND IMAGINARY POINTS 

ARE ALTERNATING. 


LDI 

512,IRO 




LDI 

2, IRl 




LDI 

511,RC 

} 

Repeat 511+1 times 


LDE 

*+AR0(1),R1 

} 

Load first imaginary point 

■k 

RPTB 

LOOP 




LDE 

*AR0++(IRO)B, RO 

} 

Load real value (and point 

1 1 

STE 

Rl,*+ARl(1) 


to next location) and store 




} 

the imaginary value 

LOOP LDF 

*+AR0(1),R1 

r 

Load next imaginary point and store 

1 1 

STE 

RO,*AR1++(IRl) 

} 

previous real value 
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Integer and Floating-Point Division 


3.4 Integer and Floating-Point Division 

Although division is not implemented as a single instruction in the ’C3x, the 
instruction set can perform an efficient division routine. Integer and floating¬ 
point division are examined separately because a different algorithm is used for 
each. 

3.4.1 Integer Division 

Division is implemented on the ’C3x by repeated subtractions using SUBC, a 
special conditional subtract instruction. Consider the case of a 32-bit positive 
dividend with i significant bits (and 32 - i sign bits), as well as a 32-bit positive 
divisor with j significant bits (and 32 -j sign bits). The repetition of the SUBC 
command i - j + 1 times produces a 32-bit result in which the lower 
i - j + 1 bits are the quotient and the upper 31 - i + j bits are the remainder 
of the division. 

SUBC implements binary division in the same manner as long division. The 
divisor, which is assumed to be smaller than the dividend, is shifted left i - j 
times to align it with the dividend. Using SUBC, the shifted divisor is subtracted 
from the dividend. For each subtraction that does not produce a negative an¬ 
swer, the dividend is replaced by the difference. It is then shifted to the left, and 
a 1 is put in the LSB. If the difference is negative, the dividend is simply shifted 
left by 1, leaving a zero in the LSB. This operation is repeated i - j h- 1 times. 
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Integer and Floating-Point Division 


As an example, consider the division of 33 by 5, using both long division and 
the SUBC method (see Figure 3-1). In this case, i = 6 and j = 3, so that the 
SUBC operation is repeated 6-3 + 1 =4 times. 

Figure 3-1. Long Division and SUBC Method 


Long division 


00000000000000000000000000000101 


00000000000000000000000000000110 

00000000000000000000000000100001 

-101 

1101 

-101 

11 


Quotient 


Remainder 


SUBC method: 


00000000000000000000000000100001 
00000000000000000000000000101000 


Negative difference 
'i' 


Dividend 
Divisor (aligned) 

(First SUBC command) 


00000000000000000000000000100010 
00000000000000000000000000101000 


00000000000000000000000000011010 


New dividend + quotient 
Divisor 

Difference (> 0) (second SUBC command) 


00000000000000000000000000110101 
00000000000000000000000000101000 


00000000000000000000000000001101 


New dividend + quotient 
Divisor 

Difference (> 0) (third SUBC command) 


00000000000000000000000000011011 
00000000000000000000000000101000 


Negative difference 


New dividend + quotient 
Divisor 

(Fourth SUBC command) 


00000000000000000000000000110110 

: nr 


Final result 


Remainder Quotient 


When the SUBC command is used, both the dividend and the divisor must be 
positive. Example 3-5 shows an example of integer division in which the sign 
of the quotient is properly handled. The last instruction before returning modi¬ 
fies the condition flag, in case subsequent operations depend on the sign of 
the result. 
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Integer and Floating-Point Division 


Example 3-5. Integer Division 


-k 

* TITLE INTEGER DIVISION 
SUBROUTINE DIVI 


INPUTS: SIGNED INTEGER DIVIDEND IN RO, 

SIGNED INTEGER DIVISOR IN Rl 


OUTPUT: RO/Rl into RO 


REGISTERS USED: R0±R3, IRO, IRl 
OPERATION: 


1. NORMALIZE DIVISOR WITH DIVIDEND 
2 . REPEAT SUBC 

3. QUOTIENT IS IN LSBs OE RESULT 


CYCLES: 


31±62 (DEPENDS ON AMOUNT OE NORMALIZATION) 



.globl 

DIVI 

SIGN 

. set 

R2 

TEMPF 

. set 

R3 

TEMP 

. set 

IRO 

COUNT 

. set 

IRl 


* DIVI ± SIGNED DIVISION 


DIVI : 

* DETERMINE SIGN OE RESULT. GET ABSOLUTE VALUE OE OPERANDS. 


XOR R0,R1,SIGN ; Get the sign 
ABSI RO 
ABSI Rl 


CMPI R0,R1 ; Divisor > dividend ? 

BGTD ZERO ; If so, return 0 

* NORMALIZE OPERANDS. USE DIEEERENCE IN EXPONENTS AS SHIET COUNT 

* EOR DIVISOR AND AS REPEAT COUNT FOR 'SUBC'. 


FLOAT R0,TEMPF 
PUSHF TEMPF 
POP COUNT 
LSH ±24,COUNT 


Normalize dividend 
PUSH as float 
POP as int 

Get dividend exponent 
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Example 3-5. Integer Division (Continued) 



FLOAT 

Rl, TEMPF 

r 

Normalize divisor 


PUSHF 

TEMPF 

r 

PUSH as float 


POP 

TEMP 

} 

POP as int 


LSH 

±24,TEMP 

r 

Get divisor exponent 


SUBI 

TEMP,COUNT 

} 

Get difference in exponents 


LSH 

COUNT,R1 

r 

Align divisor with dividend 

•k 

DO COUNT+1 SUBTRACT & 

SHIFTS. 



RPTS 

COUNT 




SUBC 

R1,R0 



■k 

:k 

MASK OFF 

THE LOWER COUNT+1 BITS OF RO. 


SUBRI 

31,COUNT 

r 

Shift count is (32 ± (COUNT+1)) 


LSH 

COUNT,RO 

} 

Shift left 


NEGI 

COUNT 



■k 

LSH 

COUNT,RO 

} 

Shift right to get result 

:k 

■k 

CHECK SIGN AND NEGATE 

RESULT 

IF NECESSARY. 


NEGI 

RO, R1 

} 

Negate result 


ASH 

±31,SIGN 

r 

Check sign 


LDINZ 

Rl, RO 

} 

If set, use negative result 


CMP I 

0,R0 

} 

Set status from result 

•k 

RETS 




■k 

■k 

RETURN 0. 




ZERO: 





LDI 

o 

o 




RETS 





. end 





If the dividend is less than the divisor and you want fractional division, you can 
perform a division after you determine the desired accuracy of the quotient in 
bits. If the desired accuracy is k bits, shift the dividend left by k positions. Then 
apply the algorithm described above, with i replaced by i + k. It is assumed that 
i + k is less than 32. 
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3.4.2 Floating-Point Inverse and Division 

This section explains how to implement floating-point division on the ’C3x. Since 
the algorithm outlined here computes the inverse of a number v, to perform y / v, 
multiply y by the inverse of v. 

The computation of 1 /v is based on the following iterative algorithm. At the 
ith iteration, the estimate x [i] of 1 / v is computed from v and the previous esti¬ 
mate X [i-1] according to the following formula: 

x[i] = x[i-1] X (2.0-v X x[i-1]) 

To start the operation, an initial estimate x [0] is needed. If v = a x 2®, a good 
initial estimate is: 

X [0] = 1.0 X 2 -e-1 

Example 3-6 shows the implementation of this algorithm on the ’C3x, where 
the iteration has been applied five times. Both accuracy and speed are af¬ 
fected by the number of iterations. The accuracy offered by the single-preci¬ 
sion floating-point format is 2 “23 = 1.192E - 7. If you want more accuracy, use 
more iterations. If you want less accuracy, reduce the number of iterations to 
decrease the execution time. 

This algorithm properly treats the boundary conditions when the input number 
either is 0 or has a very large value. When the input is 0, the exponent 
e =-128. Then the calculation of x[0] yields an exponent that is equal to 
- (-128) -1 = 127, and the algorithm overflows and saturates. On the other 
hand, in the case of a very large number with e = 127, the exponent of x[0] is 
-127-1 = -128. This causes the algorithm to yield 0, which is reasonable for 
handling that boundary condition. 
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Integer and Floating-Point Division 


Example 3-6. Inverse of a Floating-Point Number 


-k 

* TITLE INVERSE OF A FLOATING±POINT NUMBER 


* SUBROUTINE INVF 


* THE FLOATING-POINT NUMBER v IS STORED IN RO. AFTER THE 

* COMPUTATION IS COMPLETED, 1/v IS ALSO STORED IN RO. 

* TYPICAL CALLING SEQUENCE: 

* LDF v,R0 

* CALL INVF 


* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

^ - + - 

* RO I V = NUMBER TO FIND THE RECIPROCAL OF (UPON THE CALL) 

* RO I 1/v (UPON THE RETURN) 

* REGISTER USED AS INPUT: RO 

* REGISTERS MODIFIED: RO, Rl, R2, R3 

* REGISTER CONTAINING RESULT: RO 


CYCLES: 35 WORDS: 32 


.global INVF 

INVF: LDF R0,R3 ; v is saved for later 

ABSF RO ; The algorithm uses v = |v| 

* EXTRACT THE EXPONENT OF v. 

PUSHF RO 
POP Rl 

ASH +24,Rl ; The 8 LSBs of Rl contain the exponent 

* ; of V 

* x[0] FORMATION IS GIVEN THE EXPONENT OF v. 


NEGI 

Rl 




SUBI 

1, Rl 

r 

Now 

we 

ASH 

24,Rl 




PUSH 

Rl 




POPE 

Rl 

r 

Now 

Rl 


have +e+l, the exponent of x[0] 
=x[0] =1.0 *2**(+e±l) 
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Example 3-6. 

Inverse of a Floating-Point Number (Continued) 

-k 

NOW THE ITERATIONS BEGIN. 


MPYF 

R1,R0,R2 

R2 = V * X [ 0 ] 


SUBRF 

2.0,R2 

R2 = 2.0 ±v * x[0] 

-k 

MPYF 

R2,R1 

R1 = x[l] = x[0] * (2.0 ± V * x[0]) 


MPYF 

R1,R0,R2 

R2 = V * x[l] 


SUBRF 

2.0,R2 

R2=2.0-v*x[l] 

-k 

MPYF 

R2,R1 

R1 = x[2] = x[l] * (2.0 ± V * x[l]) 


MPYF 

R1,R0,R2 

R2 = V * X [ 2 ] 


SUBRF 

2.0,R2 

R2=2.0±v*x[2] 

-k 

MPYF 

R2,R1 

R1 = x[3] = x[2] * (2.0 ± V * x[2]) 


MPYF 

R1,R0,R2 

R2 = V * X [ 3 ] 


SUBRF 

2.0,R2 

R2=2.0±v*x[3] 

■k 

MPYF 

R2,R1 

R1 = x[4] = x[3] * (2.0 ± V * x[3]) 

-k 

RND 

R1 ; 

This minimizes error in the LSBs 

* 

FOR THE LAST ITERATION WE USE THE FORMULATION: 

■k 

-k 

x[5] = (x[4] * (1.0 d 

: (V * x[4] ))) + x[4] 


MPYF 

R1,R0,R2 ; 

R2= v*x[4]=1.0..01..=>l 


SUBRF 

1.0,R2 ; 

R2 = 1.0 ± V * x[4] = 0.0. .01... => 0 


MPYF 

R1,R2 ; 

R2 = x[4] * (1.0±v*x[4]) 

■k 

ADDF 

R2,R1 ; 

R2 = x[5] = (x [4] * (1.0±(v*x [4] ))) +x [4] 

■k 

RND 

R1,R0 ; 

Round since this is followed by a MPYF 

* 

■k 

NOW THE CASE OF v < C 

) IS HANDLED. 


NEGF 

RO, R2 



LDF 

R3,R3 ; 

This sets condition flags 

-k 

LDFN 

R2,R0 ; 

If V < 0, then RO = ±R0 

-k 

RETS 



•k 

■k 

END 




. end 
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3.5 Square Root Computation 

An iterative algorithm is used to compute a square root on the ’C3x and is simi¬ 
lar to the one used for computation of the inverse. This algorithm computes the 
inverse of the square root of a number v, 1 /SQRT(v). To derive SQRT(v), mul¬ 
tiply this result by v. Since in many applications division by the square root of 
a number is desirable, the output of the algorithm saves the effort to compute 
the inverse of the square root. 

At the ith iteration, the estimate x[i] of 1 / SQRT(v) is computed from v and the 
previous estimate x[i-1] according to this formula: 

x[i] = x[i-1] X (1.5-(v/2) X x[i-1] X x[i-1]) 

To start the operation, an initial estimate x[0] is needed. If v = a x 2®, a good 
initial estimate is: 

X [0] = 1.0 X 2 - e/2 

Example 3-7 shows the implementation of this algorithm on the ’C3x, where 
the iteration is applied five times. Both accuracy and speed are affected by the 
number of iterations. If you want more accuracy and less speed, increase the 
number of iterations. If you want less accuracy and more speed, reduce the 
number of iterations. 
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Example 3-7. Square Root of a Floating-Point Number 


-k 

* TITLE SQUARE ROOT OF A FLOATING±POINT NUMBER 

SUBROUTINE SQRT 

* THE FLOATING POINT NUMBER v IS STORED IN RO. AFTER THE 

* COMPUTATION IS COMPLETED, SQRT(v) IS ALSO STORED IN RO. NOTE 
THAT THE ALGORITHM ACTUALLY COMPUTES 1/SQRT (v) . 


* TYPICAL CALLING SEQUENCE: 


* LDF V, RO 

* CALL SQRT 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* - + - 

* RO I V = NUMBER TO FIND THE SQUARE ROOT OF 

* I (UPON THE CALL) 

* RO I SQRT(v) (UPON THE RETURN) 

* REGISTER USED AS INPUT: RO 

* REGISTERS MODIFIED: RO, Rl, R2, R3 

* REGISTER CONTAINING RESULT: RO 

* CYCLES: 50 WORDS: 39 

.global SQRT 

* EXTRACT THE EXPONENT OF v. 

* 


LDF 

R0,R3 ; 

: Save V 

RETSLE 

/ 

: Return if number is non+positive 

PUSHF 

RO 


POP 

Rl 


ASH 

±24, Rl ; 

: The 8 LSBs of Rl contain exponent of 

ADD I 

1,R1 ; 

: Add a rounding bit in the exponent 

ASH 

-1,R1 ; 

: e/2 

] FORMATION GIVEN 

THE EXPONENT OF v. 

NEGI 

Rl 


ASH 

24,Rl 


PUSH 

Rl 


POPE 

Rl ; 

: Now Rl = x[0] = 1.0 * 2**(±e/2) 
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Example 3-7. 

Square Root of a Floating-Point Number (Continued) 

•k 

-k 

GENERATE 

v/2 . 



:k 

MPYF 

0.5,R0 ; 

v/2 

and take rounding bit out 

■k 

-k 

NOW THE ITERATIONS BEGIN. 



MPYF 

R1,R1,R2 

R2 

= x[0] * x[0] 


MPYF 

R0,R2 

R2 

= (v/2) x[0] ^ x[0] 


SUBRF 

1.5,R2 

R2 

= 1.5 ± (v/2) * x[0] * x[0] 


MPYF 

R2,R1 

R1 

= x[l] = x[0] * 

:k 




(1.5 ± (v/2)*x[0]*x[0]) 


RND 

R1 




MPYF 

R1,R1,R2 

R2 

= x[l] * x[l] 


MPYF 

R0,R2 

R2 

= (v/2) * x[l] * x[l] 


SUBRF 

1.5,R2 

R2 

= 1 . 5 ± (v/2) * X [1] * X [1] 


MPYF 

R2,R1 

R1 

= x[2] = x[l] * 

:k 




(1.5 ± (v/2)*x[l]*x[l]) 


RND 

R1 




MPYF 

R1,R1,R2 

R2 

= x[2] * x[2] 


MPYF 

R0,R2 

R2 

= (v/2) * x[2] * x[2] 


SUBRF 

1.5,R2 

R2 

= 1 . 5 ± (v/2) * X [2] * X [2] 


MPYF 

R2,R1 

R1 

= x[3] = x[2] 



} 


* (1.5 ± (v/2)*x[2]*x[2] ) 

■k 

RND 

R1 




MPYF 

R1,R1,R2 

R2 

= x[3] * x[3] 


MPYF 

R0,R2 

R2 

= (v/2) * x[3] * x[3] 


SUBRF 

1.5,R2 

R2 

= 1.5 ± (v/2) * x[3] * x[3] 


MPYF 

R2,R1 

R1 

= x[4] = x[3] 

:k 


} 


* (1.5 ± (v/2) * x[3] * x[3] ) 

-k 

RND 

R1 




MPYF 

R1,R1,R2 

R2 

= x[4] * x[4] 


MPYF 

R0,R2 

R2 

= (v/2) * x[4] * x[4] 


SUBRF 

1.5,R2 

R2 

= 1.5 ± (v/2) * x[4] * x[4] 


MPYF 

R2,R1 

R1 

= x[5] = x[4] 

■k 


r 


* (1.5 ± (v/2) * x[4] * x[4] ) 

■k 

■k 

RND 

R1,R0 ; 

Round 


MPYF 

R3,R0 

Sqrt(v) from sqrt ( v* * (±1 )) 

■k 

RETS 




■k 

-k 

end 





. end 
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3.6 Extended-Precision Arithmetic 

The ’C3x offers 32 bits of precision for integer arithmetic and 24 bits of preci¬ 
sion in the mantissa for floating-point arithmetic. For higher precision in float¬ 
ing-point operations, the eight extended-precision registers R7 to RO contain 
eight additional bits of accuracy. Since no comparable extension is available 
for fixed-point arithmetic, this section shows how you can achieve fixed-point 
double precision by using the processor. The technique consists of performing 
the arithmetic by parts (which is similar to performing longhand arithmetic). 

In the instruction set, operations ADDC (add with carry) and SUBB (subtract 
with borrow) use the status carry bit for extended-precision arithmetic. The 
carry bit is affected by the arithmetic operations of the arithmetic logic unit 
(ALU) and by the rotate and shift instructions. It can also be manipulated direct¬ 
ly by setting the status register to certain values. For proper operation, the 
overflow mode bit should be reset (OVM = 0) so that the accumulator results 
are not loaded with the saturation values. Example 3-8 and Example 3-9 
show 64-bit addition and 64-bit subtraction. The first operand is stored in regis¬ 
ters RO (low word) and R1 (high word). The second operand is stored in R2 
and R3. The result is stored in RO and R1. 

Example 3-8. 64-Bit Addition 


■k 

-k 

TITLE 64±BIT ADDITION 


■k 

TWO 64±BIT NUMBERS ARE ADDED TO 

EACH OTHER, PRODUCING 

* 

A 64±BIT 

RESULT. THE NUMBERS X 

(R1,R0) AND Y (R3,R2) ARE 

■k 

ADDED, RESULTING IN W (Rl,R0). 



R1 

RO 


* 

■k 

+ R3 

R2 


k: 

R1 

RO 


■k 

ADD I 

R2,R0 



ADDC 

R3,R1 
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Example 3-9. 64-Bit Subtraction 


■k 

TITLE 64±BIT SUBTRACTION 




TWO 64±BIT NUMBERS ARE SUBTRACTED FROM 

EACH OTHER 

■k 

PRODUCING 

A 64±BIT RESULT. THE NUMBERS 

X 

(R1,R0) AND 

■k 

Y (R3,R2) 

ARE SUBTRACTED, RESULTING IN 

w 

{R1,R0). 

* 

R1 

RO 



■k 

* 

- R3 

R2 



■k 

R1 

RO 




SUBI 

R2,R0 




SUBB 

R3, R1 




When two 32-bit numbers are multiplied, a 64-bit product results. The proce¬ 
dure for multiplication is to split the 32-bit magnitude values of the multiplicand 
X and the multiplier Y into two parts (X1, XO) and (X3, X2), respectively, with 
16 bits each. The operation is done on unsigned numbers, and the product is 
adjusted for the sign bit. Example 3-10 shows the implementation of a 32-bit 
by 32-bit multiplication. 
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Example 3-10. 32-Bit-by-32-Bit Multiplication 


■k 

* TITLE 32 BIT X 32 BIT MULTIPLICATION 

* SUBROUTINE EXTMPY 

* 

* EUNCTION: TWO 32±BIT NUMBERS ARE MULTIPLIED, PRODUCING A 64±BIT 

* RESULT. THE TWO NUMBERS (X and Y) ARE EACH SEPARATED INTO TWO 

* PARTS (XI XO) AND (Yl YO), WHERE XO, XI, YO, AND Yl ARE 16 BITS. 
THE TOP BIT IN XI AND Yl IS THE SIGN BIT. THE PRODUCT IS 

* IN TWO WORDS (WO AND Wl). THE MULTIPLICATION IS PEREORMED ON 

* POSITIVE NUMBERS, AND THE SIGN IS DETERMINED AT THE END. 

* XI XO BITS OE PRODUCTS 


* 

■k 

X Yl YO 

(NOT COUNTING SIGN) 

PRODUCT 

* 

X0*Y0 

16 + 16 

PI 

■k 

XO^Yl 

16 + 16 

P2 

* 

Xl^YO 

16 + 16 

P3 

■k 

Xl^Yl 

16 + 16 

P4 


-k 


* Wl WO 

* 

* ARGUMENT ASSIGNMENTS: 

ARGUMENT | EUNCTION 

* - + - 

* RO I MULTIPLIER AND LOW WORD OE THE PRODUCT 

* R1 I MULTIPLICAND AND UPPER WORD OE THE PRODUCT 

* 

* REGISTERS USED AS INPUT: RO, Rl 

* REGISTERS MODIFIED: RO, Rl, R2, R3, R4, ARO, ARl 

* REGISTER CONTAINING RESULT: R0,R1 

* 
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Example 3-10. 32-Bit-by-32-Bit Multiplication (Continued) 


-k 

CYCLES: 28 (WORST CASE) 

WORDS: 25 


.global EXTMPY 



EXTMPY X0R3 RO,Rl,ARO 

r 

Store sign 


ABSl RO 

} 

Absolute values of X 


ABSl R1 

r 

and Y 

■k 

SEPARATE MULTIPLIER AND 

MULTIPLICAND INTO TWO PARTS 


LDl ±16,AR1 

LSH3 AR1,R0,R2 

} 

R2 = XI = upper 16 bits of X 


AND OFEFEH^RO 

f 

RO = XO = lower 16 bits of X 


LSH3 AR1,R1,R3 

} 

R3 = Yl = upper 16 bits of Y 

-k 

AND OFFFFH.Rl 

} 

Rl = YO = lower 16 bits of Y 

■k 

■k 

CARRY OUT THE MULTIPLICATION 


MPY13 R0,R1,R4 

r 

XO^YO = PI 


MPYl R3,R0 

f 

XO^Yl = P2 


MPYl R2,R1 

} 

Xl^YO = P3 


ADDl R0,R1 

} 

P2+P3 

■k 

MPYl R2,R3 

} 

X1*Y1 = P4 


LDl R1,R2 

LSH 16,R2 

f 

Lower 16 bits of P2+P3 


CMPl 0,AR0 

} 

Check the sign of the product 


BGED DONE 

f 

If >0, multiplication complete 



} 

(delayed) 


LSH -16,R1 

f 

Upper 16 bits of P2+P3 


ADD13 R4,R2,R0 

} 

WO = RO = lower word of the product 

■k 

ADDC3 R1,R3,R1 

f 

W1 = R1 = upper word of the product 

■k 

■k 

NEGATE THE PRODUCT IF THE 

NUMBERS ARE OF OPPOSITE SIGNS 


NOT RO 

ADDl 1,R0 

NOT R1 



■k 

ADDC 0,R1 



DONE RETS 




. end 
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3.7 IEEE/TMS320C3X Floating-Point Format Conversion 

The fast version of the IEEE-to-’C3x conversion routine was originally devel¬ 
oped by Apollo Computer, Inc. Other routines are based on this algorithm. 

In fixed-point arithmetic, the binary point that separates the integer from the 
fractional part of the number is fixed at a certain location. For example, if a 
32-bit number has the binary point after the most significant bit (MSB), which 
is also the sign bit, only fractional numbers (numbers with absolute values less 
than 1) can be represented. A number having 31 fractional bits is called a Q31 
number. All operations assume that the binary point is fixed at this location. 
The fixed-point system, although simple to implement in hardware, imposes 
limitations in the dynamic range of the represented number. This causes scal¬ 
ing problems in many applications. You can avoid this difficulty by using float¬ 
ing-point numbers. 

In a floating-point system, each integer or fraction is represented by three 
fixed-point numbers that constitute a floating-point number. Therefore, a float¬ 
ing-point number consists of a mantissa, m, multiplied by base b raised to an 
exponent e: 

m X b® 

To provide the greatest resolution, the mantissa is typically a normalized num¬ 
ber with an absolute value between 1 and 2. Although the mantissa is repre¬ 
sented as a fixed-point number, the position of the actual value is determined 
by the exponent e. 

To achieve greater efficiency in hardware implementation, the ’C3x uses a 
floating-point format that differs from the IEEE standard. This section briefly 
describes the two formats and presents software routines that show how to 
make conversions between the two formats. 


’C3x floating-point format: 


8 


23 


e 


s 


f 
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In a 32-bit word representing a floating-point number in the ’C3x, the first eight 
bits correspond to the exponent, expressed in twos-complement format. 
There is one bit for sign and 23 bits for the mantissa. The mantissa is ex¬ 
pressed in twos-complement form, with the binary point after the most signifi¬ 
cant nonsign bit. Since this bit is the complement of the sign bit s, it is sup¬ 
pressed; the mantissa actually has 24 bits. A special case occurs when 
e = -128. In this case, the number is interpreted as 0, independently of the 
values of s and f (which are set to 0 by default). The values of the represented 
numbers in the ’C3x floating-point format are as follows: 

26 X (01 .f) if s = 0 

26 X (lO.f) if s= 1 
0 ife= -128 


IEEE floating-point format: 


1 8 


23 


s 


e 


f 


The IEEE floating-point format uses sign-magnitude notation for the mantissa, 
and the exponent is biased by 127. In a 32-bit word representing a floating¬ 
point number, the first bit is the sign bit. The next eight bits correspond to the 
exponent, which is expressed in an offset-by-127 format (the actual exponent 
is e-127). The following 23 bits represent the absolute value of the mantissa 
with the most significant 1 implied. The binary point is after this most significant 
1. The mantissa actually has 24 bits. Several special cases are summarized 
below. 


These are the values of the numbers represented in the IEEE floating-point 
format: 


(-l)s X 26-127* ( 01 .f) 
Special cases: 

(-l)s X 0.0 

(-l)s X 2 - 126 * (o.f) 
(-1)6 X infinity 
NaN (not a number) 


if 0 < e < 255 


if e = 0 and f = 0 (zero) 
if e = 0 and f <> 0 (denormalized) 
if e = 255 and f = 0 (infinity) 
if e = 255 and f <> 0 


Based on these definitions of the formats, two versions of the conversion rou¬ 
tines were developed. One version handles the complete definition of the for¬ 
mats. The other ignores some of the special cases (typically the ones that are 
rarely used), but has the benefit of executing faster than the complete conver¬ 
sion. For this discussion, the two versions are referred to as the complete ver¬ 
sion and the fast version, respectively. 
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3.7.1 IEEE-to-TMS320C3x Floating-Point Format Conversion 

Example 3-11 shows the fast conversion from IEEE to ’C3x floating-point for¬ 
mat. It properly handles the general case when 0 < e < 255 and also handles 
Os (that is, e = 0 and f = 0). The other special cases (denormalized, infinity, 
and NaN) are not treated and, if present, give erroneous results. 


Example 3-11. IEEE-to-TMS320C3x Conversion (Fast Version) 


* TITLE IEEE TO TMS320C3X CONVERSION (FAST VERSION) 

* SUBROUTINE FMIEEE 

* FUNCTION: CONVERSION BETWEEN THE IEEE FORMAT AND THE 
TMS32 0C3X FLOATING-POINT FORMAT. THE NUMBER TO 

BE CONVERTED IS IN THE LOWER 32 BITS OF RO . 

* THE RESULT IS STORED IN THE UPPER 32 BITS OF RO. 

* UPON ENTERING THE ROUTINE, ARl POINTS TO THE 

* FOLLOWING TABLE: 

* 

* (0) OxFFSOOOOO <- ARl 

* (1) OxFFOOOOOO 

* (2) OxVFOOOOOO 

* (3) 0x80000000 

* (4) 0x81000000 

* 

* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

* - + - 

* RO I NUMBER TO BE CONVERTED 

* ARl I POINTER TO TABLE WITH CONSTANTS 

* 

* REGISTERS USED AS INPUT: RO, ARl 

* REGISTERS MODIFIED: RO, Rl 

* REGISTER CONTAINING RESULT: RO 

* NOTE: SINCE THE STACK POINTER SP IS USED, MAKE SURE TO 

* INITIALIZE IT IN THE CALLING PROGRAM. 

* CYCLES: 12 (WORST CASE) WORDS: 12 

.global FMIEEE 
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Example 3-11. IEEE-to-TMS320C3x Conversion (Fast Version) (Continued) 


FMIEEE 

AND 3 

R0,*AR1,R1 

Replace fraction with 0 


END 

NEG 

Test sign 



ADD I 

R0,R1 

Shift sign 





and exponent inserting 0 


LDIZ 

*+ARl(l),Rl 

If all 0, generate 

C30 0 


SUEZ 

*+ARl (2) , R1 

Unbias exponent 



PUSH 

R1 




POPE 

RO ; 

: Load this as a fit. 

pt. number 

-k 

RETS 




NEG 

PUSH 

R1 




POPE 

RO ; 

: Load this as a fit. 

pt. number 


NEGE 

o 

QC 

o 

: Negate if orig. sign 

is negative 


RETS 





Example 3-12 shows the complete conversion between the IEEE and ’C3x 
formats. In addition to the general case and the Os, it handles the special cases 
as follows: 

□ If NaN (e = 255, f< >0), the number is returned intact. 

□ If infinity (e = 255, f = 0), the output is saturated to the most positive or 
negative number, respectively. 

□ If denormalized (e = 0, f< >0), two cases are considered. If the MSB of 
f is 1, the number is converted to ’C3x format. Otherwise, an underflow oc¬ 
curs, and the number is set to 0. 
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Example 3-12. IEEE-to-TMS320C3x Conversion (Complete Version) 


* TITLE IEEE TO TMS320C3X CONVERSION (COMPLETE VERSION) 

SUBROUTINE FMIEEEl 

* FUNCTION: CONVERSION BETWEEN THE IEEE FORMAT AND THE TMS320C3x 

* FLOATING-POINT FORMAT. THE NUMBER TO BE CONVERTED 

* IS IN THE LOWER 32 BITS OF RO. THE RESULT IS STORED 

* IN THE UPPER 32 BITS OF RO. 

* 

* UPON ENTERING THE ROUTINE, ARl POINTS TO THE FOLLOWING TABLE: 

* (0) OxFFSOOOOO <- ARl 

* (1) OxFFOOOOOO 

* (2) OxVFOOOOOO 

(3) 0x80000000 

* (4) 0x81000000 

* (5) 0x7F800000 

* (6) 0x00400000 

* (7) 0x007FFFFF 

* (8) 0x7F7FFFFF 

* 

* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

* -+- 

* RO I NUMBER TO BE CONVERTED 

* ARl I POINTER TO TABLE WITH CONSTANTS 

* 

* REGISTERS USED AS INPUT: RO, ARl 

* REGISTERS MODIFIED: RO, Rl 

* REGISTER CONTAINING RESULT: RO 

* 

* NOTE: SINCE THE STACK POINTER SP IS USED, MAKE SURE TO 

* INITIALIZE IT IN THE CALLING PROGRAM. 

* CYCLES: 23 (WORST CASE) WORDS: 34 

* 


■k 

global 

FMIEEEl 



FMIEEEl 

LDI 

RO, Rl 




AND 

*+ARl(5) ,R1 




BZ 

UNNORM 

; If 

e = 0, number is either 0 or 

■k 

XOR 

*+ARl(5) ,R1 

r 

denormaiized 


BNZ 

NORMAL 

; If 

e < 255, use regular routine 
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Example 3-12. IEEE-to-TMS320C3x Conversion (Complete Version) (Continued) 


* HANDLE 

NaN AND INFINITY 



TSTB 

*+ARl(7) 

,R0 



RETSNZ 

LDI 

o 

QC 

o 

QC 


: Return if NaN 


LDFGT 

^+AR1(8) 

,R0 ; 

: If positive, infinity = 





: most positive number 


LDFN 

^+AR1(5) 

,R0 ; 

: If negative, infinity = 


RETS 



; most negative number RETS 

* HANDLE 

Os AND 

UNNORMALIZED ; 

NUMBERS 

UNNORM 

TSTB 

*+ARl(6) 

,R0 ; 

; Is the MSB of f equal to 1? 


LDFZ 

^+AR1(3) 

,R0 ; 

: If not, force the number to 0 


RETSZ 



; and return 


XOR 

*+ARl(6) 

,R0 ; 

: If MSB of f = 1, make it 0 


BND 

NEGl 




LSH 

1,R0 

/ 

: Eliminate sign bit 




! 

: & line up mantissa 


SUBI 

*+ARl(2) 

,R0 ; 

: Make e = +127 


PUSH 

RO 




POPE 

RETS 

RO 

i 

: Put number in floating point format 

NEGl 

POPE 

RO 




NEGF 

RETS 

o 

o 

/ 

; If negative, negate RO 

* HANDLE 

-k 

THE REGULAR CASES 


NORMAL 

AND 3 

RO,*AR1, 

R1 ; 

: Replace fraction with 0 


BND 

NEG 

i 

: Test sign 


ADD I 

R0,R1 

/ 

; Shift sign and exponent inserting 0 


SUBI 

*+ARl(2) 

,R1 ; 

: Unbias exponent 


PUSH 

R1 




POPE 

RETS 

RO 

i 

: Load this as a fit. pt. number 

NEG 

POPE 

RO 

/ 

: Load this as a fit. pt. number 


NEGF 

RETS 

o 

PC 

o 

PC 

i 

: Negate if original sign negative 
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3.7.2 TMS320C3x-to-IEEE Floating-Point Format Conversion 

The majority of the numbers represented by the ’C3x floating-point format are 
covered by the general IEEE format and the representation of Os. The only 
special case is e = -127 in the ’C3x format; this corresponds to a denormal- 
ized number in IEEE format. It is ignored in the fast version but treated properly 
in the complete version. Example 3-13 shows the fast version, and 
Example 3-14 shows the complete version of the ’C3x-to-IEEE conversion. 

Example 3-13. TMS320C3x-to-IEEE Conversion (Fast Version) 


* 

•k 

TITLE TMS320C3X TO IEEE CONVERSION (EAST VERSION) 



■k 

* 

■k 

SUBROUTINE TOIEEE 



* 

EUNCTION: CONVERSION BETWEEN THE TMS320C3x EORMAT 

AND 

THE IEEE 

•k 

ELOATING-POINT EORMAT. THE NUMBER TO BE CONVERTED 



* 

IS IN THE UPPER 32 BITS OE RO. THE RESULT WILL BE 

IN 


* 

■k 

THE LOWER 32 BITS OE RO. 



■k 

■k 

UPON ENTERING THE ROUTINE, ARl POINTS TO THE EOLLOWING 

TABLE: 


(0) OxEESOOOOO <- ARl 



■k 

(1) OxEEOOOOOO 




(2) OxVEOOOOOO 



•k 

(3) 0x80000000 



* 

■k 

(4) 0x81000000 



* 

ARGUMENT AS SIGNMENT S: 



•k 

ARGUMENT | EUNCTION 



* 

-+- 



■k 

RO 1 NUMBER TO BE CONVERTED 



:k 

■k 

ARl 1 POINTER TO TABLE WITH CONSTANTS 



:k 

REGISTERS USED AS INPUT: RO, ARl 



•k 

REGISTERS MODIEIED: RO 



:k 

■k 

REGISTER CONTAINING RESULT: RO 



:k 

NOTE: SINCE THE STACK POINTER 'SP' IS USED, MAKE 

SURE 

TO 

■k 

* 

:k 

INITIALIZE IT IN THE CALLING PROGRAM. 
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Example 3-13. TMS320C3x-to-IEEE Conversion (Fast Version) (Continued) 


* CYCLES: 14 

(WORST CASE) 

WORDS: 

15 

•k 

.global 

TOIEEE 




TOIEEE 

LDF 

o 

o 


} 

Determine the sign of the number 


LDFZ 

*+ARl(4) 

,R0 

r 

If 0, load appropriate number 


END 

NEC 


} 

Branch to NEC if negative (delayed) 


ABSF 

RO 


r 

Take the absolute value of the number 


LSH 

1,R0 


} 

Eliminate the sign bit in RO 


PUSHF RO 





POP 

RO 


r 

Place number in lower 32 bits of RO 


ADD I 

^+AR1(2) 

,R0 

} 

Add exponent bias (127) 


LSH 

±1,R0 


} 

Add the positive sign 


RETS 





NEC 

POP 

RO 


r 

Place number in lower 32 bits 





r 

of RO 


ADD I 

*+ARl(2) 

,R0 

} 

Add exponent bias (127) 


LSH 

±1,R0 


r 

Make space for the sign 


ADD I 

*+ARl(3) 

,R0 

} 

Add the negative sign 


RETS 
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Example 3-14. TMS320C3x-to-IEEE Conversion (Complete Version) 


■k 

■k 

TITLE TMS320C3X TO IEEE CONVERSION (COMPLETE VERSION) 


■k 

* 

SUBROUTINE TOIEEEl 



■k 

EUNCTION: CONVERSION BETWEEN THE TMS320C3x FORMAT 

AND 

THE IEEE 

■k 

FLOATING-POINT FORMAT. THE NUMBER TO BE CONVERTED 



k: 

IS IN THE UPPER 32 BITS OF RO. THE RESULT WILL BE 



■k 

■k 

IN THE LOWER 32 BITS OF RO. 



k: 

k: 

UPON ENTERING THE ROUTINE, ARl POINTS TO THE FOLLOWING 

TABLE: 

k: 

(0) OxFFSOOOOO <- ARl 



k: 

(1) OxFFOOOOOO 



■k 

(2) OxVFOOOOOO 



■k 

(3) 0x80000000 



k: 

(4) 0x81000000 



k: 

(5) 0x7F800000 



■k 

(6) 0x00400000 



* 

(7) 0x007FFFFF 



■k 

k: 

(8) 0x7F7FFFFF 



■k 

ARGUMENT AS SIGNMENT S: 



* 

ARGUMENT | FUNCTION 



■k 

-+- 



k: 

RO 1 NUMBER TO BE CONVERTED 



■k 

* 

ARl 1 POINTER TO TABLE WITH CONSTANTS 



■k 

REGISTERS USED AS INPUT: RO, ARl 



* 

REGISTERS MODIFIED: RO 



■k 

-k 

REGISTER CONTAINING RESULT: RO 



■k 

NOTE: SINCE THE STACK POINTER 'SP' IS USED, MAKE 

SURE 

TO 

k: 

k: 

INITIALIZE IT IN THE CALLING PROGRAM. 



■k 

■k 

■k 

CYCLES: 31 (WORST CASE) WORDS: 25 




.global TOIEEEl 
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Example 3- 

14. TMS320C3x-to-IEEE Conversion (Complete Version) (Continued) 

TOIEEEl 

LDF 

o 

PC 

o 

PC 


Determine the sign of the number 


LDFZ 

*+ARl(4) 

,R0 , 

If 0, load appropriate number 


END 

NEG 


Branch to NEG if negative (delayed) 


ABSF 

RO 


Take the absolute value 





of the number 


LSH 

1, RO 


Eliminate the sign bit in RO 


PUSHF 

RO 




POP 

RO 


Place number in lower 32 bits of RO 


ADD I 

*+ARl(2) 

,R0 , 

Add exponent bias (127) 


LSH 

±1,R0 


Add the positive sign 

CONT 

TSTB 

*+ARl(5) 

,R0 



RETSNZ 



; If e > 0, return 


TSTB 

^+AR1(7) 

,R0 



RETSZ 



; If e = 0 & f = 0, return 


PUSH 

RO 




POPE 

RO 




LSH 

±1,R0 

i 

: Shift f right by one bit 


PUSHF 

RO 




POP 

RO 




ADD I 

*+ARl(6) 

,R0 ; 

: Add 1 to the MSB of f 


RETS 




NEG 

POP 

RO 

i 

: Place number in lower 32 bits of RO 


BRD 

CONT 




ADD I 

*+ARI(2) 

,R0 ; 

; Add exponent bias (127) 


LSH 

±1,R0 

i 

: Make space for the sign 


ADD I 

*+ARl(3) 

,R0 ; 

: Add the negative sign 


RETS 
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Chapter 4 


Memory Interfacing 


The ’C3x interfaces connect to many device types. Each of these interfaces 
is tailored to a particular family of devices. 
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Enhanced Memory Interface .4-67 

4.8 Booting a TMS320C32 Target System in a C Environment .4-86 
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4.1 System Configuration 

The devices that can be interfaced to the ’C3x include memory, DMA devices, 
parallel and serial peripherals, and I/O devices. Figure 4-1 illustrates a typical 
configuration of a ’C3x system with various external devices and the interfaces 
to which they are connected. 

Figure 4-1. Possible System Configurations 



This block diagram represents a fully expanded system. In an actual design, you 
can use any subset of the illustrated configuration that is appropriate. 
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External Interfaces 


4.2 External Interfaces 

The ’C3x interface type depends on the device to which it is to be connected. 
Each interface comprises one or more signal lines that transfer information and 
control its operation. Figure 4-2 shows the signal line groupings for each of 
these interfaces. 


Figure 4-2. External Interfaces on the TMS320C3x 
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H3 
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All of the interfaces are independent of one another, and you can perform dif¬ 
ferent operations simultaneously on each interface. 

The primary and expansion buses implement the memory-mapped interface 
to the device. The external direct memory access (DMA) interface allows ex¬ 
ternal devices to cause the processor to relinquish the primary bus and allow 
direct memory access. 
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Primary Bus Interface 


4.3 Primary Bus Interface 

The ’C3x uses the primary bus to access the majority of its memory-mapped 
locations. When a large amount of external memory is required in a system, it 
is interfaced to the primary bus. The ’C30 expansion bus (discussed in the Ex¬ 
ternal Memory Interface chapter of the TMS320C3x User’s Guide) actually 
comprises two mutually exclusive interfaces, controlled by the MSTRB and 
lOSTRB signals. Cycles on the expansion bus that are controlled by the MSTRB 
signal are equivalent to cycles on the primary bus, except that bank switching 
is not implemented on the expansion bus. Accordingly, the discussion of primary 
bus cycles in this section applies equally to MSTRB cycles on the expansion 
bus. 

Although you can use both the primary bus and the expansion bus to inter¬ 
face to a wide variety of devices, those most commonly interfaced to these 
buses are memory devices. This section presents detailed examples of 
memory interface. 
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4.4 Zero-Wait-State Interface to Static RAMs 

Zero-wait-state read access time for the ’C3x is determined by the difference 
between the cycle time and the sum of the delay time for the interface signal 
HI low to address valid and the data setup time before the next HI low. (For 
more information, see the appropriate TMS320C3x Digital Signal Processor 
data sheet.) 


^c(H) 


^d(H1L-A) + ^su(D)R 


where: 

tc(H) = H1/H3 cycle time 

td(Hi L - A) = to address valid 

tsu(D)R = <^^ta valid before next H1 low (read) 

For example, for full-speed, zero-walt-state interface to any device, the 60-ns 
’C3x requires a read access time of 30 ns from address valid to data valid. For 
most memories, access time from a chip-select pin is the same as access time 
from address valid; therefore, it is possible to use 30-ns memories at full speed 
with the ’C3X-33. This requires that there are no delays between the processor 
and the memories. However, because of interconnection delays and because 
some gating is normally required for chip-select generation, this is usually not 
the case. Slightly faster memories are required in most systems. 

There are two distinct categories among currently available RAMs: 

□ RAMs without output enable (OE) control lines, which include the 
1 -bit-wide organized RAMs and most of the 4-bit-wide RAMs 

□ RAMs with OE controls, which include the byte-wide RAMs and a few of 
the 4-bit-wide RAMs 

Many of the fastest RAMs do not provide OE control; they use chip-select 
(CS)-controlled write cycles to ensure that data outputs do not turn on for write 
operations. In CS-controlled write cycles, the write control line (WE) goes low 
before CS goes low, and internal logic holds the outputs disabled until the cycle 
is completed. Using CS-controlled write cycles is an efficient way to interface 
fast RAMs without OE controls to the ’C30 at full speed. 
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In the case of RAMs with OE controls, using this signal can add flexibility to 
many systems. Additionally, many of these devices can be interfaced by using 
CS-controlled write cycles with OE tied low, in the same manner as with RAMs 
without OE controls. There are, however, two requirements for interfacing to 
OE RAMs in this manner: 

□ The RAM’s OE input must be gated internally with the chip-select pin and 
WE so that the device’s outputs do not turn on unless a read is being per¬ 
formed. 

□ The RAM must allow its address inputs to change while WE is low; some 
RAMs specifically prohibit this. 

Figure 4-3 shows the ’C3x interface to Cypress Semiconductor’s CY7C186 
25-ns 8K x 8-bit CMOS static RAM with the OE control input tied low and a 
CS-controlled write cycle. 
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Figure 4-i 


(. TMS320C3X Interface to Cypress Semiconductor’s CY7C186 CMOS SRAM 
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In this circuit, the two chip-select pins on the RAM are driven by the STRB and 
A23 pins, which are ANDed together internally. A23 locates the RAM at ad¬ 
dresses OOOOOh through 03FFFh in external memory, and STRB establishes 
the CS-controlled write cycle. The WE control input is then driven by the ’C3x 
R/W signal. The OE input is not used and is connected to ground. 
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Zero-Wait-State Interface to Static RAMs 


The timing of read operations, shown in Figure 4-4, is very straightforward 
because the two chip-select inputs are driven directly. The read access time 
of the circuit is the inverter propagation delay added to the RAM’s chip-select 
access time (ti -i -12 = 5 + 25 = 30 ns). This access time meets the ’C3x-33’s 
specified 30-ns read access time requirement. 


Figure 4-4. Read Operations Timing 




A23-A0 
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During write operations, shown in Figure 4-5, the RAM’s outputs do not turn 
on at all, because of the chip-select controlled write cycles. The chip-select 
controlled write cycles are generated because R/W goes active (low) before 
the STRB term of the chip-select input. Because the RAM’s output drivers are 
disabled whenever the WE input is low (regardless of the state of the OE input), 
bus conflicts with the ’C3x are automatically avoided with this interface. The 
circuit’s data setup and hold times (t-| and t 2 in Figure 4-5) of approximately 
50 ns and 20 ns easily meet the RAM’s minimum timing requirements of 10 ns 
and 0 ns. 


Figure 4-5. Write Operations Timing 
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Zero-Wait-State Interface to Static RAMs 


If you require more complex chip-select decode than can be accomplished in 
time to meet zero-wait-state timing, you can use wait states (see section 4.5, 
Wait States and Ready Signal Generation) or bank-switching techniques (see 
section 4.5.6). 

The CY7C186 SRAM’s OE control is gated internally with a CS pin; the RAM’s 
outputs are not enabled unless the device is selected. This is critical if there 
are any other devices connected to the same bus. If there are no other devices 
connected to the bus, OE does not need not to be gated internally with a chip- 
select pin. 

To interface RAM without OE controls to the ’C3x with a single memory bank 
and no other devices present on the bus, connect the memory’s CS input to 
STRB directly. If several devices must be selected, an additional gate is re¬ 
quired to AND the device select and STRB pins in order to drive the CS input 
that generates the chip-select controlled write cycles. In either case, the WE 
input is driven by the ’C3x R/W signal. If sufficient fast gating is used, 25-ns 
RAMs can be used. 

As with RAM with OE control lines, this approach works well only if a few banks 
of memory are implemented and if the chip-select decode can be accom¬ 
plished with only one level of gating. If many banks are required to implement 
very large memory spaces, bank switching can be used to provide for multiple 
bank select generation and still maintain full-speed accesses within each 
bank. Bank switching is discussed in detail in section 4.5.6 on page 4-15. 
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4.5 Wait States and Ready Signal Generation 

Wait states can greatly increase system flexibility and reduce hardware 
requirements. The ’C3x can generate wait states on either the primary bus or 
the expansion bus; both buses have independent sets of ready control logic. 
This section discusses ready signal generation from the perspective of the 
primary bus interface. However, since wait-state operation on the expansion 
bus is similar to that on the primary bus, these discussions also pertain to 
expansion bus operation. Ready signal generation is not included in 
discussions of the expansion bus interface. See the TMS320C3x User’s Guide 
for more information. 

Wait states are generated on the basis of the: 

□ Internal wait-state generator 

□ External ready input (RDY) 

□ Logical AND or OR of the two 

When enabled, internally generated wait states affect all external cycles, 
regardless of the address accessed. If different numbers of wait states are 
required for various external devices, the external RDY input may be used for 
wait-state generation to specific system requirements. 

If the logical AND (electrical OR) of the wait count and external ready signals 
is selected, the latter of the two signals controls the internal ready signal. Both 
signals must occur. Accordingly, external ready control must be implemented 
for each wait-state device, and the wait count ready signal must be enabled. 

If the logical OR (or electrical AND, since the signals are low true) of the exter¬ 
nal and internal wait-count ready signals is selected, the earlier of the two sig¬ 
nals generates a ready condition and allows the cycle to be completed. Both 
signals do not need to be present. 

4.5.1 ORing the Ready Signals 

Performing an OR of the two ready signals can implement wait states for de¬ 
vices that require a greater number of wait states than are implemented with 
external logic (up to seven). This is useful, for example, if a system contains 
both fast and slow devices. In this case, fast devices can externally generate 
a ready signal with a minimum of logic, and slow devices can use the internal 
wait counter for larger numbers of wait states. When fast devices are ac¬ 
cessed, the external hardware responds promptly with a ready signal that ter¬ 
minates the cycle. When slow devices are accessed, the external hardware 
does not respond and the cycle is terminated after the internal wait count. 
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You can perform an OR of the two ready signals if conditions require the ter¬ 
mination of bus cycles before the number of wait states implemented when ex¬ 
ternal logic takes place. In this case, the wait count that is specified internally 
is shorter than the number of wait states implemented with the external ready 
logic, and the bus cycle is terminated after the wait count. This technique can 
also safeguard against inadvertent accesses to nonexistent memory that 
would never respond with a ready signal and would lock up the ’C3x. 

If an OR of the two ready signals is used and the internal wait-state count is 
less than the number of wait states implemented externally, the external ready 
generation logic resets its sequencing to allow a new cycle to begin immediate¬ 
ly following the end of the internal wait count. This requires that consecutive 
cycles come from independently decoded areas of memory and that the exter¬ 
nal ready generation logic restarts its sequence as soon as a new cycle begins. 
Otherwise, the external ready generation logic can lose synchronization with 
bus cycles and generate improperly timed wait states. 


4.5.2 ANDing the Ready Signals 

Performing an AND of the two ready signals can implement wait states for de¬ 
vices that are equipped to provide a ready signal but cannot respond quickly 
enough to meet the ’C3x’s timing requirements. Specifically, if these devices 
normally indicate a ready condition and respond, when accessed, with await 
state until they are ready, using the logical AND of the two ready signals lowers 
the chip count in the system. In this case, the internal wait counter provides 
wait states initially and becomes ready after the external device has had time 
to send a not ready indication. The internal wait counter then remains ready 
until the external device also becomes ready, which terminates the cycle. 

In addition, performing an AND of the two ready signals can extend the number 
of wait states for devices that already have external ready logic implemented 
but require additional wait states under certain circumstances. 


4.5.3 External Ready Signal Generation 

The technique for implementing external ready generation hardware depends 
on the characteristics of the system. The optimum approach to ready signal 
generation varies, depending on the relative number of wait-state and non¬ 
wait-state devices in the system and on the maximum number of wait states 
required for any one device. The approach discussed here is general enough 
for most applications and can easily be modified and applied to many different 
system configurations. 
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Ready signal generation involves the following steps: 

1) Segmenting the address space to distinguish fast and slow devices 

2) Generating properly timed ready indications 

3) Logically ORing all of the separate ready timing signals together to con¬ 
nect to the physical ready input 

Segmenting the address space, which is commonly performed by chip-select 
generation, is required to obtain a unique indication of each area within the 
address space that requires wait states. You can use chip-select signals to 
initiate wait states; however, chip-select decoding considerations may 
occasionally provide signals that do not meet ready input timing requirements. 
In this case, you can use a small number of address lines to segment coarse 
address space. The simpler gating allows signals to be generated more 
quickly. In either case, the signal that indicates a particular area of memory is 
being addressed normally initiates a ready or wait-state indication. 

Once the region of address space being accessed has been established, a 
timing circuit provides a ready indication to the processor at the appropriate 
point in the cycle. 

Finally, since indications of ready status from multiple devices are typically 
present, the signals are logically ORed by using a single gate to drive the RDY 
input. 
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4.5.4 Ready Control Logic 

You can take one of two basic approaches to implement ready control logic, 
depending on the state of the ready input between accesses: 

□ If RDY is low between accesses, the processor is always ready unless a 
wait state is required. 

Control of full-speed devices is straightforward; no action is necessary be¬ 
cause the ready signal is always active unless otherwise programmed. 
Devices requiring wait states, however, must drive ready high fast enough 
to meet the input timing requirements. Then, after an appropriate delay, a 
ready indication must be generated. This can be difficult in many circum¬ 
stances, because wait-state devices are inherently slow and often require 
complex select decoding. 

□ If RDY is high between accesses, the processor enters a wait state unless 
a ready indication is generated. 

Zero-wait-state devices, which tend to be inherently fast, can usually re¬ 
spond immediately with a ready indication. Wait-state devices can delay 
their select signals to generate a ready indication. Typically, this approach 
results in the most efficient implementation of ready control logic. 
Figure 4-6 shows a circuit of this type, which can be used to generate 
zero, one, or two wait states for multiple devices in a system. 
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Figure 4-6. Circuit for Generation of Zero, One, or Two Wait States for Multiple Devices 

’74ALS138 



RDY 


4.5.5 Example Circuit 

In the circuit in Figure 4-6, full-speed devices drive ready signals directly 
through the 74AS21 NOR gate, and the two flip-flops delay wait-state devices’ 
select signals one or two H1 cycles to provide one or two wait states. 

Considering the ’C3x-33’s ready signal delay time of 8 ns following the ad¬ 
dress, zero-wait-state devices must use ungated address lines directly to drive 
the input of the ’74AS21, since this gate contributes a maximum propagation 
delay of 6 ns to the RDY signal. Zero-wait-state devices must be grouped to¬ 
gether within a memory address range if other devices in the system require 
wait states. 

With this circuit, devices requiring wait states might take up to 36 ns to provide 
inputs to the 74AS20 OR gate’s inputs from a valid address on the ’C3x. This 
usually allows sufficient time for any decoding required in generating select 
signals for slower devices in the system. For example, the 74ALS138 multi- 
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Wait States and Ready Signal Generation 


plexer, driven by the address bus and STRB pin, can generate select decodes 
in 22 ns, which easily meets the ’C3x-33’s timing requirements. 

With this circuit, unused inputs to either the 74AS20ORgatesorthe ’74AS21 
NOR gate must be tied to a logic high level to prevent noise from generating 
spurious wait states. 

If more than two wait states are required by devices within a system, other ap¬ 
proaches can be used for ready signal generation. If between three and seven 
wait states are required, additional flip-flops can be included in the same man¬ 
ner shown in Figure 4-6, or internally generated wait states can be used in 
conjunction with external hardware. If more than seven wait states are re¬ 
quired, an external circuit using a counter can be used to supplement the capa¬ 
bilities of the internal wait-state generators. 


4.5.6 Bank-Switching Techniques 

The ’C3x’s programmable bank-switching feature can greatly ease conflicts 
on system design circuits when large amounts of memory are required. Nor¬ 
mally, devices take longer to release the bus than they take to drive the bus; 
bank switching provides a period of time for disabling all device selects that 
are not present otherwise. During this interval, slow devices are allowed time 
to turn off before other devices have the opportunity to drive the data bus, thus 
avoiding bus contention. (See the TMS320C3x User’s Guide for further infor¬ 
mation on bank switching.) 

When a portion of the high order address lines changes (as defined by the con¬ 
tents of the BNKCMPR register) and bank switching is enabled, STRB goes 
high for one full H1 cycle. If STRB is included in chip-select decodes, this 
causes all devices to be disabled during this period. The next bank of devices 
is not enabled until STRB goes low again. 

In general, bank switching is not required during writes because write cycles 
always exhibit an inherent one-half H1 cycle setup of address information be¬ 
fore STRB goes low. When you use bank switching for read/write devices, a 
minimum of one-half H1 cycle of address setup is provided for all accesses. 
Therefore, large amounts of memory can be accessed without requiring wait 
states or extra hardware for isolation between banks. Access time for cycles 
with bank switching is the same as that for cycles without bank switching. Ac¬ 
cordingly, full-speed accesses can still be accomplished within each bank. 
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When you use bank switching to implement large multiple-bank memory sys¬ 
tems, you must consider address line fanout/loading. Besides parametric 
specifications which must be accounted for, ac characteristics are crucial in 
memory system design. With large memory arrays, which commonly require 
large numbers of address line inputs to be driven in parallel, capacitive loading 
of address outputs is often quite large. Because all ’C3x timing specifications 
are guaranteed up to a capacitive load of 80 pF, using greater loads invalidates 
guaranteed ac characteristics. It is often necessary to provide buffering for ad¬ 
dress lines when using large memory arrays. The ac timing specifications for 
buffer performance can then be derated according to manufacturer specifica¬ 
tions to accommodate a wide variety of memory array sizes. 

The circuit shown in Figure 4-7 illustrates the use of bank switching with 
Cypress Semiconductor’s CY7C185 25-ns 8K x 8-bit CMOS static RAM. This 
circuit implements 32K 32-bit words of memory with one-wait-state accesses 
for each bank. 

The bank memory requires a wait state with this implementation because of 
the added propagation delay presented by the address bus buffers used in the 
circuit. The wait state is not a function of the memory organization of multiple 
banks or the use of bank switching. Memory access speeds are the same with 
and without bank switching, once bank boundaries are crossed. No speed 
penalty is incurred by using bank switching, except for the occasional extra 
cycle inserted when bank boundaries are crossed. If this extra cycle impacts 
software performance significantly, you can often restructure code to minimize 
bank boundary crossings and reduce the effect of these boundary crossings 
on software performance. 

The wait state for this bank memory is generated by using the wait-state gener¬ 
ator circuit described in section 4.5.5 on page 4-14. Because the A23 signal 
enables the entire bank memory system, the inverted version of this signal is 
ANDed with STRB to derive a one-wait-state device select. This signal is then 
connected in the circuit along with the other one-wait-state device selects. Any 
time a bank memory access occurs, one wait state is generated. 
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Figure 4-7. Bank Switching for Cypress Semiconductor’s CY7C185 SRAM 
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Each of the four banks in this circuit is selected by decoding signals A15-A13 
generated by the 74ALS138 multiplexer (see Figure 4-8). With the 
BNKCMPR register set to OBh, the banks are selected on even 8K-word 
boundaries, starting at location 080A000h in external memory space. 


Memory Interfacing 


4-17 








































































































































































Wait States and Ready Signal Generation 


Figure 4-i 


I Bank-Memory Control Logic 
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The ’C3x rated capacitive loading is 80 pF. The 74ALS254 buffers used on the 
address lines are necessary in this design because the total capacitive load 
presented to each address line is a maximum of 16 x 10 pF or 160 pF (bank 
memory plus zero-wait-state static RAM). Using the manufacturer’s derating 
curves for these devices at a load of 80 pF (the load presented by the bank 
memory) predicts propagation delays at the output of the buffers to a maximum 
of 16 ns. The access time of a read cycle within a bank of the memory is the 
sum of the memory access time and the maximum buffer propagation delay 
(25 16 = 41 ns). Since this propagation delay falls between 30 and 90 ns, it 

requires only one wait state on the ’C3x-33. 

The ’74ALS254 buffers offer an additional system-performance enhance¬ 
ment—they include 25-Ll resistors in series with each buffer output. These re¬ 
sistors greatly improve the transient response characteristics of the buffers, 
especially when driving CMOS loads, such as the memories used here. The 
effect of these resistors is to reduce overshoot and ringing, which are common 
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Figure 4-i 


when driving predominantiy capacitive ioads, such as for CMOS devices. The 
result is reduced noise and increased immunity in the circuit, which, in turn, 
results in a more reliable memory system. Having these resistors included in 
the buffers eliminates the need to put discrete resistors in the system, which 
is often required in high-speed memory systems. 

This circuit cannot be implemented without bank switching because the data 
output’s turn-on and turn-off delays cause bus conflicts. The propagation delay 
of the ’74ALS138 multiplexer is involved only during bank switches, when 
there is sufficient time between cycles to allow new chip-selects to be 
decoded. 

Figure 4-9 shows the timing of this circuit for read operations using bank 
switching. With the BNKCMPR register set to OBh, when a bank switch occurs, 
the bank address on address lines A23-A13 is updated during the extra H1 
cycle while STRB is high. Then, after chip-select decodes have stabilized and 
the previously selected bank has disabled its outputs, STRB goes low for the 
next read cycle. Further accesses occur at normal bus timings with one wait 
state, as long as another bank switch is not necessary. Write cycles do not re¬ 
quire bank switching because of the inherent address setup provided in their 
timings. This timing is summarized in Table 4-1. 


*. Timing for Read Operations Using Bank Switching 
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Table 4-1. Bank-Switching Interface Timing for the TMS320C3x-33 


Timer Interval 

Event 

Time Period 

t1 

H1 falling to address valid/STRB rising 

14 ns 

t2 

Address valid to select delay 

10 ns 

t3 

Memory disable from STRB 

10 ns 

t4 

H1 falling to STRB 

10 ns 

t5 

STRB to select delay 

4.5 ns 

t6 

Memory output enable delay 

3 ns 
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4.6 Interfacing Memory to the TMS320C32 DSP 

The ’C32 accesses external memory with one 24-bit address bus, one 32-bit 
data bus, and three strobes: lOSTRB, STRBO, and STRB1. The strobes are 
mapped to selected portions of the memory map as shown in Figure 4-10 on 
page 4-23. For example, if the CPU is reading data from location 881234h, the 
active strobe during the read bus cycle is STRBO. Unlike the other two strobes, 
STRBO is assigned to two noncontiguous address spaces within the memory 
map to provide extra flexibility in address decoding for glueless memory inter¬ 
faces. 


The behavior of lOSTRB is similar to that of its counterpart in the ’C30. Its tim¬ 
ing characteristics are slightly relaxed in comparison with STRBO and STRB1 
cycles to better accommodate slower I/O peripherals. In contrast to STRBO 
and STRB1, lOSTRB uses a single signal line and accesses the external data 
one full 32-bit word at a time. STRBO and STRB1 are composed of four signal 
lines each. The multiple signal lines per strobe enable the STRBO and STRB1 
cycles to access external memory one byte, one half-word, or one full word at 
a time. For example, to read a single byte from a 32-bit-wide external memory 
location mapped to STRBO, the address on the address bus points to the se¬ 
lected 32-bit word and only one STRBO signal is activated (driven low) to select 
the desired byte. To access two bytes of data at the memory location mapped 
to STRB1, two STRB1 signal lines are asserted during the bus cycle. Full 
32-bit bus cycles involving STRBO or STRB1 memory space result in four 
strobe signals simultaneously accessing four bytes of data. The 32-bit STRBO 
and STRB1 bus cycles are no different functionally from the lOSTRB cycles 
but simply have tighter timing parameters. 

The STRBO and STRB1 cycles are not limited to just selecting bytes out of 
32-bit memory locations. There are two strobe control registers that configure 
the data size and memory width for STRBO and STRB1 bus cycles (one control 
register per strobe). With proper initialization of the strobe control registers, the 
bus cycles can be configured to encompass any combination of data size and 
physical memory width. For example, a byte can be read from a 16-bit-wide 
memory or a 32-bit word can be written to an 8-bit-wide memory by configuring 
the memory width and data size fields of the corresponding strobe control reg¬ 
isters (see Figure 4-10). 

Like other members of the ’C3x generation, the ’C32 program, as well as the 
data, can reside in any portion of the memory map. The ’C32 program fetches 
from address space mapped to lOSTRB are indistinguishable from lOSTRB 
data reads or writes. However, the STRBO and STRB1 cycles are configured 
slightly differently for program fetches than for data accesses. Program and 
data can still share the same portions of the memory map, but instead of set- 
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ting the memory width and data size fieids in STRBO and STRB1 controi regis¬ 
ters, the program fetch cycles from the memory spaces mapped to STRBO and 
STRB1 are configured by hardwiring the PRGW (program memory width se¬ 
lect) pin. There is no need to use the data size fields, because all program 
fetches apply only to instruction words that are 32 bits wide. The memory width 
field of the strobe control register is useless at reset, when the processor is 
fetching the reset vector from memory. At that point the strobe control register 
is always configured in the same way, but different systems can have different 
memory widths. The PRGW pin indicates to the memory interface whether the 
program memory is 16 or 32 bits wide. Program memory that is 8 bits wide is 
not supported, because four cycles per instruction degrade the performance 
too much for it to be useful for most applications. 
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Figure 4-10. STRBO and STRB1 Control Registers and the PRGW Pin 
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4.6.1 Functional Description of the Enhanced Memory Interface 

The enhanced memory interface controls all data and program traffic between 
data buses inside the chip and the 32-bit external memory bus as shown in 
Figure 4-10 through Figure 4-13. For any bus cycle involving a logical 
memory address range mapped to lOSTRB, the memory interface simply con¬ 
nects the external data bus with an appropriate internal data bus without fur¬ 
ther data manipulation. 

The memory interface is much busier when the ’C32 is accessing logical 
memory addresses mapped to STRBO and STRB1. Depending on the data 
size and external memory width (as defined by corresponding strobe control 
registers), data can be packed, unpacked, truncated, or shifted on its way to 
and from the chip. 

Section 4.6.1.1 through section 4.6.1.4 illustrate how the data is manipulated 
when the interface has to match variable-size data with 8-, 16-, and 32-bit-wide 
physical memories. In these sections, five lines of code are included in the pro¬ 
gram space in each figure: 


LDI 

4, RC 

RPTB 

LI 

LDI 

*AR0++, RO 

FLOAT 

RO, R1 

STF 

Rl, *AR1++ 


These lines of code read five integers from one data space, convert them to 
floating-point format, and write them to another memory space that is assigned 
to a different strobe. Each example has a different combination of data sizes 
and external memory widths to illustrate the range of possible combinations. 

For data access and program fetch cycles in which the data size exceeds the 
physical memory width, the least significant bytes/half-words are always 
transferred first. 
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4.6.1.1 STRBO and STRB1 Data Access: Data Size = Memory Width 

In the case of STRBO and STRB1 data access, where data size equals 
memory width, the data size and memory width for STRBO and STRB1 data 
access cycles are configured in the corresponding strobe control registers 
(see Table 4-2). 

The short program stored in the internal RAMO memory begins with the load 
integer (LDI) instruction reading an 8-bit integer from 8-bit-wide STRBO 
memory (see Figure 4-11). As the integer data passes through the memory 
interface, it is sign extended to 32 bits and loaded to RO as a 32-bit integer. 
Next, the integer-to-floating-point conversion (FLOAT) instruction converts the 
integer in RO to a 40-bit floating-point number and loads it into R1. Finally, the 
store floating-point value (STF) instruction truncates the 40-bit contents of R1 
to 32 bits and stores it in the 16-bit-wide STRB1 memory. As the data passes 
through the memory interface, the 24-bit mantissa is truncated to eight bits (the 
8-bit exponent remains unmodified). 


Table 4-2. STRBO and STRB1 Data Access: Data Size = Memory Width 


Data Access 

Strobe 

Data Size 

Memory Width 

Input data 

STRBO 

8 

8 

Output data 

STRB1 

16 

16 

Program 

RAMO 

32 

32 
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Figure 4-11.STRB0 and STRB1 Data Access: Data Size = Memory Width 
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4.6.1.2 STRBO and STRB1 Data Access: Data Size ^ Memory Width 

The input and/or output data does not have to be the same size as the memory 
it is being read to or written from (see Tabie 4-3). The data size and memory 
width for STRBO and STRB1 data access cycies are configured in the corre¬ 
sponding strobe controi registers. 

The short program stored in the RAM1 memory begins with the LDI instruction 
reading an 8-bit integer from 16-bit-wide STRBO memory (see Figure 4-12). 
Since each address contains two data bytes, the memory interface uses differ¬ 
ent STRBO iines to differentiate between the high byte and the iow byte. (Both 
STRBO and STRB1 comprise four signals each, one for each byte of the 32 
bits.) Next, the FLOAT instruction converts the integer in RO to a 40-bit floating¬ 
point number and loads it to R1. Finally, the STF instruction stores the contents 
of R1 to 16-bit-wide memory as a 32-bit number. Before the data arrives at the 
memory interface, the 32-bit mantissa is truncated to 24 bits (the 8-bit expo¬ 
nent remains unmodified). The memory interface then stores the 24-bit man¬ 
tissa and the 8-bit exponent in 16-bit-wide memory, two bytes at a time, using 
two cycles and two physical memory addresses. 


Table 4-3. STRBO and STRB1 Data Access: Data Size ^ Memory Width 


Data Access 

Strobe 

Data Size 

Memory Width 

Input data 

STRBO 

8 

16 

Output data 

STRB1 

32 

16 

Program 

RAM1 

32 

32 
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Figure 4-12. STRBO and STRB1 Data Access: Data Size ^ Memory Width 
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4.6.1.3 Program Fetch From 16-Bit STRBO Memory 

Table 4-4 shows program memory mapped to 16-bit-wide STRBO or STRB1 
memory. By hardwiring the PRGW pin to a high state, 32-bit data transfers to 
and from the 32-bit-wide external memory do not involve any data operations 
in the memory interface. 

The short program stored in STRBO memory begins with the LDI instruction 
reading a 32-bit integer from 32-bit-wide lOSTRB memory and loading it to RO 
(see Figure 4-13). Next, the FLOAT instruction converts the integer in RO to 
a 40-bit floating-point number and loads it into R1. Finally, the STF instruction 
truncates the 40-bit contents of R1 to 32 bits and stores it in the 32-bit-wide 
STRB1 memory. The data is not modified as it passes through the memory in¬ 
terface. 

The program controlling the data conversion in this example is stored in the 
32-bit-wide memory bank mapped to STRBO. As discussed earlier, program 
fetch cycles do not reference the strobe control register to determine the width 
of the program memory. Instead, the memory interface checks the state of the 
PRGW pin to determine the memory width. Because the program memory is 
16 bits wide, the PRGW pin should be pulled up to Vqq, effectively directing 
the memory interface to fetch instructions in two bus cycles per instruction (16 
bits at a time). 


Table 4-4. Program Fetch From 16-Bit STRBO Memory 


Data Access 

strobe 

Data Size 

Memory Width 

Input data 

STRBO 

32 

32 

Output data 

STRB1 

32 

32 

Program 

lOSTRB 

32 

16 
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Figure 4-13. Program Fetch From 16-Bit STRBO Memory 
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4.6.1.4 Program Fetch From 32-Bit STRB1 Memory 

Table 4-5 shows program memory mapped to 32-bit-wide STRBO or STRB1 
memory. By hardwiring the PRGW pin to a low state, 32-bit data transfers to 
and from the 32-bit-wide external memory do not involve any data operations 
in the memory interface. 

The small program stored in STRB1 memory begins with the LDI instruction 
reading a 32-bit integer from 32-bit-wide STRBO memory and loading it into 
RO (see Figure 4-14). Next, the FLOAT instruction converts the integer in RO 
to a 40-bit floating-point number and loads it into R1. Finally, the STF instruc¬ 
tion truncates the 40-bit contents of R1 to 32 bits and stores it in the 32-bit-wide 
lOSTRB memory. The data is not modified as it passes through the memory 
interface. 

The program controlling the data conversion in this example is stored in the 
32-bit-wide memory bank mapped to STRB1. Program fetch cycles do not ref¬ 
erence the strobe control register to determine the width of the program 
memory. Instead, the memory interface checks the state of the PRGW pin to 
determine the memory width. Because the program memory is 32 bits wide, 
the PRGW pin should be grounded, effectively directing the memory interface 
to fetch instructions in one bus cycle per instruction (32 bits at a time). 


Table 4-5. Program Fetch From 32-Bit STRB1 Memory 


Data Access 

strobe 

Data Size 

Memory Width 

Input Data 

STRBO 

32 

32 

Output Data 

STRB1 

32 

32 

Program 

lOSTRB 

32 

32 
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Figure 4-14. Program Fetch From 32-Bit STRB1 Memory 
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4.6.2 Logical Versus Physical Address 

The ’C32 is a 32-bit processor. Its instruction set operates on 32-bit registers; 
the CPU alone does not read 8- or 16-bit data or data transfers. When a ’C32 
instruction writes to a physical address, it sends all 32 bits of data to the 
memory interface unit through an internal bus. It is only in the memory interface 
that the internal 32-bit data can assume 8-bit or 16-bit form, provided that the 
address is in the STRBO or STRB1 range of the memory map. The data size 
field of the STRBO or STRB1 control register determines the actual size of the 
data portion that is placed on the external memory bus of the ’C32. Likewise, 
when a ’C32 instruction reads a portion of data from external memory, the 
memory interface always converts it to 32 bits as it enters the chip. What hap¬ 
pens to the external data as it goes through the memory interface on the way 
to the CPU depends on the contents of the STRBO and STRB1 control regis¬ 
ters. Again, only the data whose address falls within the STRBO or STRB1 
range of the memory map can be manipulated inside the memory interface 
unit. 

Throughout this document, the term /og/ca/address applies to a memory loca¬ 
tion that is referenced by ’C32 instructions; the logical address is a part of the 
processor’s logical memory map. The physical address refers to the address 
that appears at the ’C32 address pins. The valid ranges of the logical memory 
map that the program instructions can reference are determined by: 

□ The external memory available in the system 

□ The manner in which the external memory address pins are matched with 
the ’C32 address pins (which depends on physical memory width) 

□ The contents of the STRBO and STRB1 registers (which define physical 
memory width and the data size) 

The logical memory map shown in Figure 4-15 always contains 32-bit data as 
far as the CPU is concerned. It is only when the data passes through the 
memory-interface block that the data size can actually change to 8 or 16 bits, 
as directed by the appropriate strobe control register. For example, when the 
processor reads a byte (eight bits) from external memory, the 8-bit data is sign- 
extended or padded with Os as it passes through the memory interface so that 
it becomes 32-bit data inside the ’C32. Likewise, when the processor writes 
the contents of a 32-bit register to 16-bit-wide external memory, the internal 
32-bit data is truncated to 16 bits as it passes through the memory interface. 
The dashed lines inside the logical memory map in Figure 4-15 show the inter¬ 
nal 32-bit representation of the external data that has a physical size of 8 or 
16 bits. 

Figure 4-15 explains logical/physical addresses and other terms related to the 
’C32 memory interface. 
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Figure 4-15. Description of Terms Invoived In TMS320C32 Memory Interface 
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4.6.3 32-Bit Memory Configuration Design Exampies 

The following sections describe examples of interfacing the ’C32 to 32-bit¬ 
wide external memory from both the hardware and software-addressing view¬ 
points. 

4.6.3.1 32-Bit Memory Address Translation for Data Size = Memory Width 

When both data size and memory width are 32 bits, the STRBO memory inter¬ 
face behaves like the lOSTRB memory interface. The only difference between 
the two is the number of strobe lines connected to the respective memory 
banks: four for STRBO and one for lOSTRB. 

Figure 4-16 is a schematic diagram of a 32-bit interface consisting of two 
memory banks, each controlled by a separate strobe. The four signal lines of 
STRBO are assigned to the chip-select pins of four 32K x 815-ns SRAMs. The 
single lOSTRB signal line is connected to the chip-enable pins of four 
32Kx8 30-ns EPROMs. For the 60-MHz version of the ’C32, the 15-ns 
SRAMs operate with zero wait states and the 30-ns EPROMs require one wait 
state. (Software wait states can be programmed in the strobe control regis¬ 
ters.) 

The hardware memory configuration is depicted in Figure 4-16. Figure 4-17 
illustrates the programmer’s view of the hardware memory configuration. The 
logical addresses (appearing in program instructions) are represented in the 
context of the entire memory map to identify the respective strobes. The physi¬ 
cal addresses are the values that actually appear at the pins of the processor. 
Since lOSTRB operates exclusively on 32-bit data types, the memory inter¬ 
face does not modify the address going in and out of the CPU; the logical and 
physical addresses are identical. In this example, STRBO also operates on 
32-bit data since the memory width field of the STRBO control register contains 
a binary value of 11. Since the STRBO physical memory width is also 32 bits 
(see data size field in Figure 4-17), there is no need for address translation 
from the logical address to its physical representation. 
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Figure 4-16. 32-Bit Memory Configuration (STRBO and lOSTRB) 
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Figure 4-17. 32-Bit Memory Configuration (STRBO and iOSTRB) 
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4.6.3.2 32-Bit Memory Address Translation for Data Size < Memory Width 

One memory location can store 2 or 4 data values. Therefore, if the data re¬ 
quires 16 or 8 bits of precision, the effective addressing range of the same 
physical 32-bit memory is doubled or quadrupled by simply changing the data 
size field of the appropriate strobe control register before the transfers begin. 
The logical-to-physical address translation involves a 2-bit address shift if the 
data size is 8 bits and a 1 -bit shift if the data size is 16 bits. The memory inter¬ 
face automatically performs address shifts and the activation of selected ex¬ 
ternal memory bytes with appropriate strobe control lines (as directed by the 
strobe control registers). 

Figure 4-18 is the schematic diagram of a 32-bit interface consisting of two 
memory banks, each controlled by a separate strobe. The four signal lines of 
STRBO are assigned to the chip-select pins of four 32K x 8 15-ns SRAMs, and 
the four signal lines of STRB1 are connected to the chip-enable pins of four 
32Kx830-ns EPROMs. Forthe 60-MHz version of the’C32, the 15-ns SRAMs 
operate at zero wait states and the 30-ns EPROMs require one wait state. 
(Software wait states can be programmed in strobe control registers.) 

Figure 4-19 illustrates the programmer’s view of the hardware memory con¬ 
figuration depicted in Figure 4-18. The logical addresses (appearing in pro¬ 
gram instructions) are represented in the context of the entire memory map to 
identify the respective strobes. In this case, the STRBO memory transfers op¬ 
erate on 16-bit data to and from 32-bit-wide memory, as defined in the STRBO 
control register. STRB1 accesses 8-bit data to and from 32-bit-wide memory, 
as defined by the STRB1 control register. Since two 16-bit data types can fit 
in a single 32-bit-wide memory location referenced by a single physical ad¬ 
dress, a mechanism is needed to distinguish between the 16-bit data portions. 
This is accomplished by using the least significant bit (LSB) of the logical ad¬ 
dress to activate a different pair of the four STRBO signal lines for each access, 
leaving the second LSB of the logical address to become the LSB of the physi¬ 
cal address and effectively shifting the logical address by one bit. Similarly, 
STRB1 8-bit data transfers to the 32-bit-wide external memory cause the ad¬ 
dress to be shifted by two bits, because the two LSBs of the logical address 
are used to select one out of four bytes sharing the same physical 32-bit 
memory location. 
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Figure 4-18. 32-Bit Memory Configuration (STRBO and STRB1) 
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Figure 4-19. 32-Bit Memory Address Transiation: Data Size < Memory Width 
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4.6.4 16-Bit and 8-Bit Memory Configuration Design Exampies 

This section describes how to interface the ’C32 to both 8- and 16-bit-wide ex¬ 
ternal memories in the same design from both the hardware and software-ad- 
dressing perspectives. 

Figure 4-20 contains a schematic diagram of the external memory interface 
consisting of two banks, each controlled by a separate strobe. Two of four 
STRBO signal lines are assigned to the chip-select pins of two 32K x 8 15-ns 
SRAMs; one of four STRB1 signals is connected to a chip-enable pin of one 
32K X 8 30-ns EPROM. For the 60-MHz version of the ’C32, the 15-ns SRAMs 
operate at zero wait states and the 30-ns EPROMs require one wait state. 
(Software wait states can be programmed in strobe control registers.) Any time 
the external memory is less than 32 bits wide, some of the strobe pins switch 
functions and become additional address pins. For 16-bit-wide memory, 
STRB0_B3 becomes A_-|; for 8-bit-wide memory, STRB1_B3 and STRB1_B2 
become A_-| and A_ 2 , respectively. This is the only external change that differ¬ 
entiates the 32-bit-wide memory interface from the 16- and 8-bit-wide memory 
interfaces. This feature can be considered transparent to the software pro¬ 
grammer, except that the programmer must configure the strobe control regis¬ 
ters appropriately. The memory interface automatically drives the additional 
address lines with correct values, depending on the size of the data being 
transferred. 

The following three sections illustrate how the physical addresses are derived 
from the logical addresses when the data size is equal to, greater than, and 
less than the width of the physical memory. Though address translation is com¬ 
pletely automatic, these cases provide insight into the range of physical ad¬ 
dresses actually affected during transfer of 32-, 16-, and 8-bit data. 
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Figure 4-20. 16-Bit and 8-Bit Memory Configuration: A Complete Minimum Design 
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Note: The EPROM is connected for data access (shifted address) and not for boot table access. This system is booted from the serial port (see INT3 signal). 
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4.6.4.1 16-Bit and 8-Bit Memory Address Transiation for Data Size = Memory Width 

As shown in Figure 4-21, when the external memory width matches the size 
of data being transferred, the physical address also matches the logical ad¬ 
dress with one exception: the physical address is shifted relative to the logical 
address by one bit for 16-bit transfers and by two bits for 8-bit transfers. This 
means that the address bit that would normally be expected on pin AO actually 
appears on pin A_-| or A_ 2 . As Figure 4-21 shows, there is one-to-one corre¬ 
spondence between logical data and its counterpart in physical memory. 
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Figure 4-21. 16-Bit and 8-Bit Memory Address Transiation: Data Size = Memory Width 
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4.6.4.2 16-Bit and 8-Bit Memory Address Transiation for Data Size > Memory Width 

Figure 4-22 depicts what happens when data is transferred that is larger than 
the physical memory in which it is to reside. As shown by the contents of the 
strobe control registers, STRBO controls transfers of 32-bit data to and from 
16-bit-wide physical memory and STRB1 controls transfers of 16-bit data to 
and from byte-wide memory. When an instruction stores 32-bit data to logical 
address Oh, the memory interface must perform two write cycles to 16-bit-wide 
external memory. These two write cycles involve two consecutive addresses. 
Oh and 1 h. A 16-bit portion of data logically referenced with a single address 
actually requires two physical addresses to be stored in 8-bit-wide physical 
memory (as is the case with the STRB1 transfer shown at the bottom of 
Figure 4-22). To implement these extra bus cycles, the memory interface ap¬ 
pends an extra address bit to the least significant end of both addresses. As 
in section 4.6.4.1, the LSBs of the STRBO and STRB1 addresses appear at 
pins A_-| and A_ 2 , respectively, because they represent 16- and 8-bit-wide me¬ 
mories. 
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Figure 4-22. 16-Bit and 8-Bit Memory Address Transiation: Data Size > Memory Width 
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4.6.4.3 16-Bit and 8-Bit Memory Address Transiation for Data Size < Memory Width 

The example in Figure 4-23 is, in a way, an inverse of that in Figure 4-22. The 
8-bit data is transferred to and from 16-bit-wide external memory. To put this 
example In perspective, assume that the data transfer is triggered by the fol¬ 
lowing ’C32 instruction: STI R0,@7FFFh. While In RO, the data Is sized at 32 
bits, but when it arrives at the memory interface, the STRBO control register 
data size field Indicates 8-bit-wide data. So, the 32-bit data is truncated to 8 
bits. The now byte-sized data is transferred to address 7FFFh of the 16-bit- 
wide external memory. In this case, the LSB of the logical address (as refer¬ 
enced by the Instruction) is actually rerouted to control one of the two STRBO 
lines assigned to the 16-bit physical memory. If the LSB is 1 (as in this case), 
STRB0_B1 is asserted during the write cycle. If the LSB is 0, STRB0_B0 is as¬ 
serted during the write cycle. The remaining bits of the original logical address 
are placed on the external address bus starting at pin A_-| (because the 
memory width is 16 bits). 

4.6.4.4 Design Considerations 

While designing the external memory Interface to the ’C32, a hardware engi¬ 
neer must remember to match address pin A_i with the AO pin of a 16-bit-wide 
memory, or to match the A _2 address pin with the AO pin of a byte-wide 
memory. If the external memory is 32 bits wide, the pins are not shifted relative 
to each other and, therefore, match perfectly at AO. 

When writing code for the ’C32, the programmer does not have to be con¬ 
cerned about the structure of the physical memory. The programmer must sim¬ 
ply be aware of the logical memory map and the configuration of the two strobe 
control registers. The ’C32 memory interface automatically performs all of the 
address translation tasks and byte packing/unpacking necessary to match 
variable-size data with physical memories of different widths; they are con¬ 
trolled by the data size and memory width fields of the STRBO and STRB1 con¬ 
trol registers. 
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Figure 4-23. 16-Bit and 8-Bit Memory Address Transiation: Data Size < Memory Width 
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4.6.5 One Bank /Two Strobes (32-Bit-Wide Memory) Design Exampies 

This section describes how to use two strobes in interfacing the ’C32 to a single 
physical bank of memory. Such configuration enables the access to 32-bit pro¬ 
grams and to two differently sized portions of data out of the same bank of 
memory with no speed penalty. This feature is implemented by internally AND- 
ing STRBO and STRB1 and outputting the combined strobes on STRBO (a total 
of four lines). The one bank/two strobes memory configuration is useful in sys¬ 
tems in which, for example, the program requiring 32-bit instruction words for 
maximum execution speed operates on data that needs only 16 bits of preci¬ 
sion (see Figure 4-27 on page 4-56). 

Figure 4-24 is the schematic diagram of a 32-bit-wide external memory con¬ 
figuration arranged as one bank with two separate logical control strobes shar¬ 
ing the same STRBO physical signal lines. The four STRBO signals are as¬ 
signed to the chip-select pins of four 32K x 8 15-ns SRAMs, one signal per 
chip. For the 60-MHz version of the ’C32, the 15-ns SRAMs operate at zero 
wait states. (For slower devices, additional software wait states can be pro¬ 
grammed in the appropriate fields of the strobe control registers.) Because the 
total memory width is 32 bits, there is no mismatch between the processor’s 
and the memory’s address pins. Therefore, the ’C32 pin AO is matched with 
memory pin AO, A1 is matched with A1, and so on. As mentioned earlier, both 
STRBO and STRB1 signals appear together on the four STRBO control pins. 
This behavior is selected by setting the strobe configuration bit of the STRBO 
control register to 1 (see Figure 4-24). Since both STRBO and STRB1 are 
mapped to different ranges of the logical memory map, the strobe that actually 
appears on the physical STRBO pins depends on the internal address of the 
data/program being accessed. The two strobes effectively split the physical 
memory into two, with the high memory address bit selecting either the STRBO 
or STRB1 address space. For example, if all program instructions are fetched 
from logical addresses 880000h-881 OOOh and all data reads/writes are con¬ 
fined between 980000h and 981 OOOh, the program fetches are associated 
with STRBO and all data accesses are driven by STRB1 (see Figure 4-10 on 
page 4-23 for strobe/memory mapping). Since the behavior of each strobe is 
determined by a different control register, the program fetches and data reads/ 
writes, in each case, can vary in the number of STRBO lines that are simulta¬ 
neously driven and in the number of bus cycles required per access. This is 
shown on the following pages. 
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Figure 4-24. One Bank/Two Strobes Memory Configuration: Memory Width = 32 Bits 
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4.6.5.1 One Bank/Two Strobes Address Translation for Data Size = 16 and 8 Bits 

Figure 4-25 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 16-bit data and the other with 8-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg¬ 
ister field, STRB CONFIG (strobe configuration), is set to 1 to indicate that both 
STRBO and STRB1 are mapped to the same set of four STRBO pins. The high 
memory address pin (in this case, A14) selects between the two halves of the 
memory. For this example, the ’C32 address pin A17 drives the memory pin 
A14. 

The state of the A17 bit of the physical address is derived from the logical ad¬ 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0 (after accounting for a 1-bit address 
shift due to the 16-bit width of the data). Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1 (after accounting for a 2-bit address 
shift due to the 8-bit width of the data). The logical STRBO and STRB1 address 
ranges selected to drive the physical address pin A17 to 0 and 1, respectively, 
must still conform to the logical memory map that assigns fixed blocks of ad¬ 
dresses to different strobe spaces. 

An STI R0,*AR0 instruction (with ARO = 887FFFh) results in a STRBO data ac¬ 
cess (data size = 16 bits) driving the STRB0_B2 and STRB0_B3 control pins 
to write the contents of the 32-bit register RO into a 16-bit data location in the 
lower half of the external memory addressed by 3FFFh. Similarly, an LDI 
*AR1,R1 instruction (with AR1 = 98FFFFh) results in a STRB1 data access 
(data size = 8 bits) driving the STRB0_B3 control pin (STRB CONFIG = 1) to 
read the contents of an 8-bit data location in the upper half of the external 
memory addressed by 7FFFh to the 32-bit R1 register. The ’C32 automatically 
performs all address translation; the programmer merely monitors the logical 
memory map and the two strobe control registers. 
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Figure 4-25. One Bank/Two Strobes Address Translation: Data Size = 16 and 8 Bits 
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4.6.5.2 One Bank/Two Strobes Address Translation for Data Size = 32 and 8 Bits 

Figure 4-26 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 32-bit data and the other with 8-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg¬ 
ister field, STRB CONFIG, is set to 1 to indicate that both STRBO and STRB1 
are mapped to the same set of four STRBO pins. The high memory address 
pin (in this case, A14) selects between the two halves of the memory. For this 
example, the ’C32 address pin A17 drives the memory pin A14. 

The state of the A17 bit of the physical address is derived from the logical ad¬ 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0. Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1 (after accounting for a 2-bit address 
shift due to the 8-bit width of the data). Additionally, the logical STRBO and 
STRB1 address ranges that drive the physical address pin A17 to 0 and 1, re¬ 
spectively, must still conform to the logical memory map that assigns fixed 
blocks of addresses to different strobe spaces. 

An STI R0,*AR0 instruction (with ARO = 883FFFh) results in a STRBO data ac¬ 
cess (data size = 32 bits) driving the STRB0_B0, STRB0_B1, STRB0_B2, and 
STRB0_B3 control pins to write the contents of the 32-bit register RO into a 
32-bit data location in the lower half of the external memory addressed by 
3FFFh. Similarly, an LDI *AR1 ,R1 instruction (with AR1 = 98FFFFh) results in 
a STRB1 data access (data size = 8 bits) driving the STRB0_B3 control pin 
(because STRB CONFIG = 1) to read the contents of an 8-bit data location in 
the upper half of the external memory addressed by 7FFFh to the 32-bit R1 
register. The ’C32 automatically performs all address translation; the program¬ 
mer merely monitors the logical memory map and the two strobe control regis¬ 
ters. 
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Figure 4-26. One Bank/Two Strobes Address Translation: Data Size = 32 and 8 Bits 
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4.6.5.3 One Bank/Two Strobes Address Translation for Data Size = 16 and 32 Bits 

Figure 4-27 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 16-bit data and the other with 32-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg¬ 
ister field, STRB CONFIG, is set to 1 to indicate that both STRBO and STRB1 
are mapped to the same set of four STRBO pins. The high memory address 
pin (in this case, A14) selects between the two halves of the memory. For this 
example, the ’C32 address pin A17 drives the memory pin A14. 

The state of the A17 bit of the physical address is derived from the logical ad¬ 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0 (after accounting for a 1-bit address 
shift due to the 16-bit width of the data). Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1. The logical STRBO and STRB1 ad¬ 
dress ranges that drive the physical address pin A17 to 0 and 1, respectively, 
must still conform to the logical memory map that assigns fixed blocks of ad¬ 
dresses to different strobe spaces. 

An STI R0,*AR0 instruction (with ARO = 887FFFh) results in a STRBO data ac¬ 
cess (data size = 16 bits) driving the STRB0_B2 and STRB0_B3 control pins 
to write the contents of the 32-bit register RO into a 16-bit data location in the 
lower half of the external memory addressed by 3FFFh. Similarly, an LDI 
*AR1,R1 instruction (with AR1 = 923FFFh) results in a STRB1 data access 
(data size = 32 bits) driving the STRB0_B0, STRB0_B1, STRB0_B2, and 
STRB0_B3 control pins (because STRB CONFIG = 1) to read the contents of 
a 32-bit data location in the upper half of the external memory addressed by 
7FFFh to the 32-bit R1 register. The ’C32 automatically performs all address 
translation; the programmer merely monitors the logical memory map and the 
two strobe control registers. 
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Figure 4-27. One Bank/Two Strobes Address Translation: Data Size = 16 and 32 Bits 
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4.6.5.4 Example Summary 

The one bank/two strobes memory interface to the ’C32 supports any com¬ 
bination of data size pairs (16/8, 32/8, and 16/32 bits) with no speed penalty. 
(The strobe control registers do not have to be reconfigured each time the data 
size changes.) Likewise, 16-bit external memory can be divided into two 
halves, each containing data of a different size (8, 16, or 32 bits). The same 
holds true for 8-bit external memory. All address translation information given 
in section 4.6.1 through section 4.6.4 also applies to the one bank/two strobes 
examples. 

To configure the external memory for one bank/two strobes access mode, use 
the following steps: 

1) Set the strobe configuration field in the STRBO control register to 1. 

2) Set the memory width field in both the STRBO and STRB1 control registers 
to reflect the width of the physical memory. 

3) Set the data size field in both the STRBO and STRB1 control registers to 
reflect the size of the data portions chosen for each strobe. 

4) Choose one of the high physical address bits to split the physical memory 
into two halves. 

5) For the two memory halves, choose the STRBO and STRB1 logical ad¬ 
dress ranges to drive the chosen bit to 0 and 1, respectively. The chosen 
STRBO and STRB1 address ranges must fit inside the legal STRBO/ 
STRB1 address spaces, as defined by the memory map. 


4.6.6 RDY Signal Generation 

The ’C32 uses the RDY pin to determine whether the current bus cycle finishes 
at the end of the current clock cycle or requires additional clock cycles to com¬ 
plete. Even though the ’C32 can fetch instructions and access data in one 
clock cycle, a slow memory may need additional clock cycles (wait states) to 
complete the bus cycle. The RDY signal can be handled in one of three ways: 


□ The RDY pin can be permanently grounded, indicating to the CPU that the 
external memory is always ready for the next cycle. This is used where all 
external memory is fast enough to preclude wait states. 

□ The wait states can be programmed in software by setting bits in corre¬ 
sponding strobe control registers, if there is only one device per strobe. 
This method can be used even if there are external devices that require 
wait states. The RDY pin must be permanently grounded. 
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□ The active generation of the RDY signai is required oniy if a singie strobe 
controls two or more external memory banks or peripherals requiring dif¬ 
ferent numbers of wait states. 


The remainder of this section describes the active generation of the RDY sig¬ 
nal. The example involves three memory banks controlled by STRBO, each re¬ 
quiring a different number of wait states. This example directly applies to RDY 
signal generation involving STRB1 and is similar to the case of lOSTRB, which 
involves a more relaxed set of timing parameters. 


4.6.6.1 RDY Signal Timing Parameters for STRBO and STRB1 

Figure 4-28 and Table 4-6 contain STRBO and STRB1 timing parameters that 
are typically used to generate the RDY signal. As evident in the read and write 
timing waveforms, the RDY signal generated by the external logic is clocked 
into the ’C32 on the falling edge of the H1 clock. The associated setup time is 
represented by parameter 17 and the hold time by parameter 18. Thus, for the 
60-MHz ’C32, the RDY signal must arrive at the RDY pin at least 17 ns before 
the falling edge of H1 and remain valid at least until H1 goes low. Timing pa- 
rametersH and 12 representthe STRBO and STRB1 low and high delays from 
the falling edge of HI. Timing parameter 14 represents the address valid delay 
from the falling edge of HI. For back-to-back write cycles, timing parameter 
22 represents the address valid delay from the rising edge of HI. Parameters 
11,12,14, and 22 do not directly apply to RDY setup and hold, but are never¬ 
theless involved in the generation of the RDY signal. 
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Figure 4-28. RDY Signal Timing for STRBO and STRB1 Cycles 



STRBO, STRB1, read cycle 


STRBO, STRB1, write cycle 


Table 4-6. RDY Signal Generation 





’C32-40t 

’C32-50t 

’C32-60t 


Parameter 



(50 ns) 

(40 ns) 

(33 ns) 


number 


Description 

Min 

Max 

Min 

Max 

Min 

Max 

Unit 

11 

td{H1L-SL) 

Delay time, HI low to STRBx low 

0 

11 

0 

9 

0 

8 

ns 

12 

td{H1L-SH) 

Delay time, HI low to SRBx high 

0 

11 

0 

9 

0 

8 

ns 

14 

td{H1L-A) 

Delay time, HI low to A valid 

0 

11 

0 

9 

0 

8 

ns 

17 

tsu{RDY) 

Setup time, RDY before HI low 

21 


19 


17 


ns 

18 

th{RDY) 

Hold time, RDY after HI low 

0 


0 


0 


ns 

22 

td{H1H-A) 

Delay time, HI high to A valid on back- 
to-back write cycles (write) 


11 


9 


8 

ns 


t These timing specifications are subject to change without notice. See the TMS320C32 Digital Signal Processor data sheet 
for current tinning information. 
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4.6.6.2 RDY Signal Generation for STRBO Signals 

Figure 4-29 shows three memory banks controlled by a single strobe 
(STRBO). The first bank is composed of four 8-bit-wide SRAMs requiring zero 
wait states to operate at 60 MHz (15-ns devices). Bank 2 is composed of two 
1-wait-state SRAMs, and bank 3 contains one 3-wait-state EPROM (which is 
8 bits wide). The RDY pin is normally high, indicating a not-ready state. It goes 
low if either RDY BANKt or RDY_BANK23 goes low. 

The RDY_BANK1 signal is asserted only if two conditions are satisfied: 

□ At least one of the four STRBO signal lines must be active. 

□ The three address decode bits must match the bank 1 space. 

Since no wait states are involved, the RDY_BANK1 signal does not have to 
be synchronized with the H1/H3 clocks, and, therefore, it can directly drive the 
RDY pin after being gated with its bank 2/bank 3 counterpart. 

The STRB0_BANK23 signal becomes active (high) if the three address de¬ 
code bits match bank 2 or bank 3 address spaces while STRB0_B0 and/or 
STRB0_B1 are active (low). The STRB0_BANK23 signal, when high, sets a 
high data state in a synchronous progression through a chain of four registers. 
Depending on which point in the chain is tapped, a RDY signal delay ranging 
from zero to three wait states can be achieved. In this case, both 1 -wait-state 
and 3-wait-state taps assert the RDY_B23YES signal to reflect bank 2 or bank 
3 access. Finally, a 2-register circuit removes the trailing active low edge of the 
RDY_B23YES signal by ORing it with RDY_23NOT (see Figure 4-30). The 
resulting RDY_BANK23 is ANDed with its bank 1 counterpart to drive the RDY 
pin. 
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Figure 4-29. RDY Signal Generation for STRBO Cycles 
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Figure 4-30 contains timing waveforms for RDY signal generation. It illus¬ 
trates how the RDY signal is generated for a series of external back-to-back 
memory read cycles in which the first cycle accesses bank 1 (zero wait states), 
the second cycle accesses bank 2 (one wait state), the third cycle accesses 
bank 3 (three wait states), and the fourth and fifth cycles access bank 1 (zero 
wait states). For each read cycle, the RDY waveform is marked with a resulting 
setup time. For the 60-MHz device, the RDY signal must become valid at least 
17 ns before every falling edge of the HI clock. 

In the 0-wait-state cycle, the address and strobe signals become valid 8 ns 
from the falling edge of HI. An additional 5 ns are needed for a single pass 
through a fast combinational logic device for a total setup time of the resulting 
RDY signal equal to 20 ns. This leaves 3 ns for board delays and a modest 
safety factor. 

For the 1 - and 3-wait-state cycles, the bank decode and strobe signals do not 
directly drive the RDY signal. They are instead combined into the 
STRB0_BANK23 signal that, when active, releases the clear condition on the 
3-register delay chain driven by the H3 clock. The register chain is then free 
to propagate a high state at the rate of one register per clock cycle. The two 
taps in the register chain (at the first and third registers, representing one wait 
state and three wait states, respectively) are ORed with their corresponding 
bank select signals to result in the RDY_B23YES signal synchronous to HI /H3 
clocks. The RDY_B23YES leading-edge 10-ns delay is caused by two passes 
through a fast PAL® device (such as a 22V10). The trailing edge of this signal 
is caused by bank 2 or bank 3 decode circuits going inactive after the RDY sig¬ 
nal is recognized by the processor. The address decode (8 ns) plus two passes 
through the PAL (5 + 5 ns) combine for a total delay of 18 ns that can cut into 
the next cycle’s RDY setup requirement (33 - 18 = 15 ns) if not modified. To 
deactivate the RDY signal sooner, a single-register circuit is added to generate 
the RDY_B23NOT, which, when ORed with the RDY_B23YES, yields the 
RDY_BANK23 signal that satisfies the RDY setup time for the next cycle. Fi¬ 
nally, RDY_BANK1 and RDY_BANK23 are ANDed together to produce the fi¬ 
nal RDY signal that is wired to the processor’s RDY pin. 
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Figure 4-30. RDY Signal Generation Timing Waveforms 
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4.6.7 Address Decode for Multiple Banks 

Figure 4-31 illustrates the logical-to-physical address translation for the three 
memory banks used in the RDY signal generation example in section 4.6.6. 
Each memory bank is a different physical width, as shown by the physical ad¬ 
dress column on the right side of the figure. The left side of the figure repre¬ 
sents the internal (logical) address ranges for each of the three memory banks. 
Logical-to-physical address translation is controlled by strobe control registers 
and by their data size and memory width fields. The middle column of 
Figure 4-31 shows the logical address field (top row) over the physical ad¬ 
dress (bottom row) for each address translation case. The active address 
fields are shaded gray, and the inactive address bits are white. The black fields 
are special address bits that can selectively control multiple strobe lines or 
choose between individual portions of a data word that is larger than the physi¬ 
cal memory it is accessing. 

For example, in bank 2, the right side of the figure indicates that the physical 
memory width for this bank is 16 bits. The left side indicates that, regardless 
of the physical memory width, 32-, 16-, and 8-bit data can be moved by pro¬ 
gramming the STRBO control register. The low-order (shaded) bits of logical/ 
physical address rows show how many bits are actually used for addresses 
so that the correct high-order address bits can be assigned to bank decode. 
Physical address bits A17 and A18 are chosen for bank decode because they 
lie outside the used address bits. A17 and A18 decode between banks 1,2, 
and 3, with A18-A17 = (0,1) assigned to bank 1, (1,0) assigned to bank 2, and 
(1,1) assigned to bank 3. Address bit A23 is set to 0 to isolate the STRBO ad¬ 
dress space from the STRB1 and lOSTRB memory maps. 

The dotted lines bounding the bank decode bits allow you to see that the exter¬ 
nal address bits, A18-A17, line up perfectly, but their logical address counter¬ 
parts do not. The amount of reverse shift between the logical and physical ad¬ 
dresses depends on the size of the data being accessed and the width of the 
physical memory. Each of the three address translation cases for each of the 
three banks translates physical address bits A18-A17 into two contiguous 
logical address bits that can lie anywhere between A20 and A17. Once the log¬ 
ical images of the external bank decode bits are identified along with low-order 
address bits and the A23 strobe decode bit, they define the final logical 
memory map for the three STRBO banks together. 
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Figure 4-31. Address Decode for Multiple Memory Banks 

Memory Bank 1 ^0 


Logical address 


32 bits wide 


32 bits wide 


32 bits wide 


20000h 


27FFFh 

40000h 


4FFFFh 

SOOOOh 


9FFFFh 


40000h 


43FFFh 

SOOOOh 


87FFFh 

lOOOOOh 


lOFFFFh 


SOOOOh 


61FFFh 

COOOOh 


C3FFFh 
1SOOOOh 


1S7FFFh 


A23 

I 


I I I 

I I I 

11... I. I 


I I 
I I 

u 


n 


7FFFh 


Physical address 


5 


□ 


7FFFh 


Oh 


Oh 


Memory Bank 2 






Memory Bank 3 


I !' 

A23 ;Ai7 


^ ; ; I ; 

-mi ^=1 Oh 

TTtti -► 


7FFFh 


32 bits wide 


_Oh 




7FFFh 


n 


7FFFh 




D 


7FFFh 


Oh 


Oh 


16 bits wide 


^ - Oh 


7FFFh 


-Oh 


n 


7FFFh 

-► 


I I 

A-V 


7FFFh 


Oh 


8 bits wide 


Note: Active address fields are shaded gray; inactive address bits are white. The black fields are special address bits that con¬ 
trol multiple strobe lines or choose between portions of a data word that is larger than the physical memory it is accessing. 
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Each memory bank actually has three logical memory maps, depending on the 
size of the data being accessed and the setting of the corresponding bits in the 
STRBO control register. 

The address ranges in these logical memory maps are all different, yet all three 
maps translate perfectly into a single physical address map that identifies the 
bank. In using the three logical memory maps, the programmer must exercise 
caution to prevent overwriting 8-bit data with 16-bit data (or 16-bit data with 
32-bit data) that may have a different logical address but still occupy the same 
place in physical memory. To be certain that the logical address maps 
associated with 8-, 16-, and 32-bit data sizes do not overlap within a single 
physical memory bank, the three logical maps must be further divided into 
mutually exclusive areas before they are used by the programmer. Further¬ 
more, when a program jumps from one physical memory bank to another of 
a different width, the memory width configuration bits in the appropriate strobe 
register must be changed. 
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4.7 How TMS320 Tools Interact With the TMS320C32’s Enhanced Memory 
Interface 


The ’C32’s memory interface accesses external memory through one 24-bit 
address bus and one 32-bit data bus. The data bus is shared by three mutually- 
exclusive strobes: STRBO, STRB1, and lOSTRB. Depending upon the ad¬ 
dress accessed, the ’C32 activates one of these strobes. (See the 
TMS320C3X User’s Guide ior more information about memory maps.) 


STRBO and STRB1 can access 8-, 16-, or 32-bit data quantities from 8-, 16-, 
or 32-bit-wide memory. Access is achieved by four signals within each strobe. 
These signals are: 

□ STRBx_B3/A_i 

□ STRBx_B2/A_2 

□ STRBx_B1 

□ STRBx_B0 

The listed signals serve as byte-enable pins for accessing a byte, half-word, 
or full-word from external memory. The first two signals also serve as addition¬ 
al address pins when performing two or four consecutive accesses in 8- or 
16-bit-wide external memory. The data accessed is truncated, packed, or un¬ 
packed accordingly, with no additional overhead. The following list shows the 
behavior of these pins, as dictated by the data size and memory-width bit 
fields. 

The default value of a strobe control register depends on the program memory 
width select (PRGW) pin level. 

□ 8-bit-wide memory 

■ STRBx_B3/A_-| and STRBx_B2/A_2 are address pins. 

■ STRBx_B0 is a byte-enable/chip-select signal. 

■ STRBx_B1 is not used. 

□ 16-bit-wide memory 

■ STRBx_B3/A_-| are address pins. 

■ STRBx_B1 and STRBx_B0 are byte-enable signals. 

■ STRBx_B 2/A_2 are not used. 

□ 32-bit-wide memory 

■ STRBx_B3/A_-|, STRBx_B2/A_2, STRBx_B1, and STRBx_B0 are 
byte-enable signals. 
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□ Data size: 

■ 8-bit data: The physical address is the logical address shifted right by 
2 . 

■ 16-bit data: The physical address is the logical address shifted right 
by 1. 

■ 32-bit data: The physical address is the logical address. 

lOSTRB can access 32-bit data from 32-bit-wide memory. However, lOSTRB 
does not have the flexibility of STRBO and STRB1 because it is composed of 
a single signal. lOSTRB bus cycles differfrom STRBO and STRB1 bus cycles. 
(See the Interlocked Operations section in the Program Flow Control chapter 
of the TMS320C3X User’s Guide for more information.) This timing difference 
accommodates slower I/O peripherals. 

The ’C32 also supports program execution from 16- and 32-bit external 
memory widths. Execution is controlled through the status of the PRGW pin. 
When this pin is pulled high, the ’C32 executes from 16-bit-wide memory. 
When the PRGW pin is pulled low, the ’C32 executes from 32-bit-wide 
memory. For 16-bit-wide zero-wait-state memory, the ’C32 takes two instruc¬ 
tion cycles to fetch a single 32-bit instruction. The lower 16 bits of the instruc¬ 
tion are obtained during the first cycle; the upper 16 bits are retrieved and con¬ 
catenated with the lower 16 bits during the second cycle. The ’C32’s 32-bit 
memory fetches are identical to those of the ’C30 and ’C31. 

In summary, the ’C32 memory interface parallel bus implements three mutual¬ 
ly exclusive address spaces that are distinguished through the use of three 
separate control signals (see Figure 4-32). STRBO and STRB1 support 8-, 
16-, and 32-bit data access in 8-, 16-, and 32-bit-wide external memory and 
32-bit program access in 16/32-bit-wide external memory. lOSTRB address 
space supports 32-bit data/program access in 32-bit-wide external memory. 
Internally, the ’C32 has a 32-bit architecture; accordingly, the memory inter¬ 
face packs and unpacks the data accessed. Three strobe control registers ma¬ 
nipulate the variable-width memory interface of the ’C32. (See the 
TMS320C3X User’s Guidetor a detailed description of the ’C32 memory inter¬ 
face.) 
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Figure 4-32. TMS320C32 Memory Address Spaces 



4.7.1 C Compiler Interaction With the TMS320C32 Memory Interface 

The ’C32’s internal 32-bit architecture allows the C compiler’s data types to re¬ 
main 32 bits wide. However, the C compiler’s runtime-support library includes 
pragma directives and new dynamic-allocation routines (malloc, realloc, cal- 
loc, bmalloc, free, etc.) that support the creation of data sections. These data 
sections serve as memory pools for storing 8- and 16-bit data. These sections 
can reside in 8-, 16-, and 32-bit-wide memory. The programmer must ensure 
that the appropriate strobe control register is loaded with the correct data size 
and memory width. The ’C32’s memory interface truncates, packs, or unpacks 
the data in the manner specified by the settings of the strobe control register. 
Table 4-7 lists the data sizes supported by the sections created by the C com¬ 
piler. 

Table 4-7. Data Sizes Supported by Sections Created by the C Compiler 


Section Type 

32 Bits 

16 Bits 

8 Bits 

Initialized 

.text 

.cinit 

.const 

.user_sectlon 

.user_sectlon 

.user_sectlon 

Uninitialized 

.bss 

.stack 

.sysmem 

.user_sectlon 

.sysm16 

.user_sectlon 

.sysm8 

.user_sectlon 
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The contents of the named sections are as follows: 

□ .text: executable code and/or string literals 

□ .cinit: tables for variable and constant initialization 

□ .const: string literals and switch tables 

□ .bss: global variables and statically allocated variables 

□ .stack: system stack used to pass function arguments and to allocate local 
function variables 

□ .sysmem: memory pool for dynamic allocation of 32-bit data 

□ .sysm1 6: memory pool for dynamic allocation of 16-bit data 

□ .sysmS: memory pool for dynamic allocation of 8-bit data 

□ .user_section: section created using the #pragma DATA_SECTION di¬ 
rective 

The following sections describe the C compiler’s preprocessor pragma and 
modules in the runtime-support library that support 8- and 16-bit memory 
pools. The 32-bit memory pools are handled through the standard minlt(), mal- 
loc(), smallocO, calloc(), realloc(), and free() routines, which operate on the 
.sysmem section. 

4.7.1.1 DATA_SECTION Pragma Directive 

To support additional memory pools, the C compiler uses a data section prag¬ 
ma directive. This directive instructs the C compiler to allocate space for sym- 
bol_name in the section specified by section_name of size symbol_size. (See 
the TMS320 Floating-Point DSP Optimizing C Compiier User’s Guide for addi¬ 
tional information.) The syntax for DATA_SECTION is as follows: 

#pragma DATA_SECTION (symbol_name, "section_name") 
type symbol_name; 

For example, define a new section called .mydata as an array of 1K integer 
values in the following manner: 

#pragma DATA_SECTION(dataBuf, ".mydata") 
int dataBuf [ 1024] ; 
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4.7.1.2 MEM0RY8. C Module 

The MEM0RY8.C module contains functions that implement dynamic 
memory management routines for using 8-bit data with the ’C32. (See the 
TMS320C3x/C4x Optimizing C Compiler User’s Guidetor more information on 
8-bit runtime-support functions.) 

The pragma directive in the MEMORY8.C module defines a .sysm8 section. 
The size of this memory pool in words (system memory or heap) is set at link 
time by using the -heap8 option. If the -heap8 option is not used, the compiler 
does not allocate an 8-bit system memory area. If arguments are not used in 
conjunction with this switch, the size of the 8-bit system memory area defaults 
to 1K 8-bit words. The following functions operate in the 8-bit .sysm8 section: 

□ minit8(): initializes and resets the 8-bit dynamic memory management 
system 

□ malloc8(): allocates 8-bit words from the 8-bit memory pool and returns 
a pointer to the allocated space 

□ calloc8(): allocates 8-bit words from the 8-bit memory pool, clears allo¬ 
cated memory locations, and returns a pointer to the allocated space 

□ realloc8(): reallocates 8-bit words from previously unallocated areas in 
the 8-bit memory pool; a pointer to the allocated space is returned 

□ free8(): frees previously allocated space from the 8-bit memory pool 

□ bmalloc8(): allocates 8-bit words from the 8-bit memory pool. The allo¬ 
cated words are aligned to a boundary that is suitable for the ’C32’s circu¬ 
lar and bit-reversed buffers; a pointer to the allocated space is returned. 

□ _SYSMEM8_SIZE: an external label that contains the size, in words, of 
the 8-bit system memory pool 

4.7.1.3 MEMORY16. C Module 

The MEMORY16.C module contains functions that implement dynamic 
memory management routines for the ’C32’s 16-bit data. (See the 
TMS320C3x/C4x Optimizing C Compiler User’s Gu/c/e for more information on 
16-bit runtime-support functions.) 

The pragma directive in the MEMORY16.C module defines a .sysmi 6 section. 
The size of this memory pool in words (system memory or heap) is set at link 
time by using the -heap16 option. If the -heap16 option is not used, the 
compiler does not allocate a 16-bit system memory area. If arguments are not 
used in conjunction with this switch, the size of the 16-bit system memory area 
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defaults to 1K 16-bit words. The following functions operate in the 16-bit 
.sysm16 section. 

□ minit 16 (): initializes and resets the 16-bit dynamic memory management 
system 

□ malloc 16 (): allocates 16-bit words from the 16-bit memory pool and re¬ 
turns a pointer to the allocated space 

□ calloc 16 (): allocates 16-bit words from the 16-bit memory pool, clears al¬ 
located memory locations, and returns a pointer to the allocated space 

□ realloc 16 (): reallocates 16-bit words from previously unallocated areas 
in the 16-bit memory pool; a pointer to the allocated space is also returned 

□ free 16 (): frees previously allocated space from the 16-bit memory pool 

□ bmalloc 16 (); allocates 1 6-bit words from the 1 6-bit memory pool. The al¬ 
located words are aligned to a boundary that is suitable for the ’C32’s cir¬ 
cular- and bit-reversed buffers; a pointer to the allocated space is also re¬ 
turned. 

□ _SYSMEM1 6_ SIZE: an external label that contains the size, in words, 
of the 16-bit system memory pool 

4.7.1.4 Memory Pool Limitations 

The ’C32 has only three strobes: STRBO, STRB1, and lOSTRB. This means 
a programmer cannot have more than three memory pools; one memory pool 
assigned to each strobe. lOSTRB can hold only 32-bit data and can only ac¬ 
commodate the 32-bit memory pool .sysmem. Conversely, STRBO and 
STRB1 can hold 8-, 16-, and 32-bit data and can accommodate the 8-, 16-, and 
32-bit memory pools .sysm8, .sysm16, and .sysmem. 

All pointers and constants must be stored in memory configured to hold 32-bit 
data. Hence, the .bss, .stack, .cinit, and .const sections must reside in memory 
with data size configured to 32 bits. 

4.7.2 C Compiler and Assembler Switch 

To create code for the ’C32, the assembler and C compiler use the -v32 version 
specification switch. The following example demonstrates the use of this 
switch with the assembler and C compiler, respectively: 
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4.7.3 Linker Switches 

To support the ’C32’s 8- and 16-bit memory pools, the linker uses the following 
switches: -heap8, -heap16, and -heap. These switches set the size, in words, 
of the respective 8-, 16-, and 32-bit memory system areas .sysm8, .sysm16, 
and .sysmem. The user must link these sections into the appropriate address¬ 
es, thereby activating strobes that are configured to access 8-, 16-, or 32-bit 
data. 

The following example demonstrates the link-time sizing of an 8-bit memory 
pool to 256K words: 

lnk30 -heaps 0x4000 


The linker creates these memory system areas using an input file that contains 
the .sysmem, .sysm8, and .sysm16 data-section definitions. If the input file 
does not exist, the linker is unable to perform memory area processing. 

The linker also creates the global symbols _SYSMEM_SIZE, _SYS- 
MEM8_SIZE, and _SYSMEM16_SIZE and subsequently assigns each a val¬ 
ue equal to the respective -heap, -heap8, and -heap16 size. The default size 
for each memory system area is 1K words (word size depends on system 
memory width). 

4.7.4 Debugger Configuration 

For the debugger to properly disassemble and read/write external memory, 
the user must configure the strobe control registers before loading and execut¬ 
ing code. Because the ’C32 supports code execution from 16- or 32-bit 
memory, the debugger may need to temporarily set the strobe control register 
to a 32-bit data size in order to write an instruction (either by loading code or 
patching code) or to read an instruction with the objective of disassembling a 
range of program memory. 

To support code execution from 16- and 32-bit memory, the memory map add 
(ma) command includes a new type parameter that directs the debugger to 
treat .text sections as 32-bit data. While reading or writing .text sections, the 
debugger does the following: 

□ Temporarily stores the configuration of the appropriate strobe control 
register 

□ Temporarily sets the data size to 32 bits 

□ Reads or writes the targeted portion of the .text section 

□ Restores the strobe control register to its previous value 
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The syntax for the memory map add command is: 

ma address, length, type 

where: 

address defines the starting address of a range of memory 

length defines the length of the memory range 

type identifies the read/write characteristic of the memory range de¬ 
pending upon one or more of the following keywords: 

□ R: read only 

□ W: write only 

□ WR or RAM: read/write 

□ PROTECT: no-access memory 

□ TX: memory that stores .text (code) section 

4.7.5 TMS320C32 Configuration Exampies 

Ths section describes the possible ’C32 memory interface configurations, in¬ 
cluding instructions on how to allocate buffers, build link files, and configure 
the debugger for each memory configuration. 

4.7.5.1 Two External Memory Banks 

The ’C32’s external memory interface allows the use of two zero-wait-state ex¬ 
ternal memory banks with different widths without requiring additional logic or 
incurring access penalty costs. These external memory banks provide flexibil¬ 
ity in balancing performance and system cost (performance and system cost 
increase with wider memory chips). For example, the programmer can 
execute code from 32-bit wide memory while storing data in 8-bit memory (see 
Figure 4-33). This approach is advantageous for applications with large 
amounts of 8-bit data that require execution at the fastest speed of the device. 
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Figure 4-33. Zero-Wait-State Interface for 32-Bit and 8-Bit SRAM Banks 


TMS320C32 



In Figure 4-33, a bank of 32K x 32 bits is mapped to STRBO, and a bank of 
32K X 8 bits is mapped to STRB1. For this configuration, the programmer must 
set the following: 

□ STRBO control register physical memory width to 32 bits and the data type 
size to 32 bits 

□ STRB config bit field to 0, that is, STRBO control register = OOOFOOOOh 
(banks are separate) 

□ STRB1 control register physical memory width to 8 bits and the data type 
size to 8 bits, that is, STRB1 control register = OOOOOOOOh 

Additionally, the PRGW pin must be pulled low to indicate 32-bit program 
memory width. 
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Figure 4-33 also maps the 32-bit-wide bank’s external memory address pins, 
Ai 4 A- 13 ...A-|Ao,tothe’C 32 ’sA-| 4 Ai 3 Ai 2 ...AiAq pins. Conversely, the 8 -bit-wide 
bank’s memory address pins, A 14 A 13 ...A 1 A 0 , are mapped to the ’C32’s 
A-| 2 ...A-|AoA.i pins. Because STRB1 is configured for 8 -bit memory width, the 
external address presented on ’C32 pins is shifted right by two bits. As a result 
of this mapping, external memory accesses in the range Oh through 7FFFh 
read or write 32-bit data to the 32-bit-wide bank (STRBO). Memory accesses 
in the range OOOOOOh through 907FFFh read or write 8 -bit data to the 8 -bit- 
wide bank (STRB 1 ). 

Two banks of different memory widths must not be connected to the same 
STRB without external decode logic. Different memory widths require 
STRBx_Bx signals to be configured as address pins. These address pins are 
active for any external memory access, that is, STRBO, STRB 1 , lOSTRB, and 
program fetches. 

8-bit Dynamic Memory Aiiocation 

This section contains C code examples of 8 -bit dynamic buffer allocation, link¬ 
er configuration, and a debugger batch file. 

Example4-1 demonstrates the allocation of two buffers (IK and 4K 8 -bit 
words) using the 8 -bit dynamic memory allocation routines. 

Example 4-1. 8-Bit Dynamic Buffer Allocation 


void main() 

{ 

int ^bufferl; _ 

float *buffer2; /* Configure the STRBO control register for 32-bit wide 

memory, 32-bit da ta si ze. */ 

*0x808064 = OxFOOOO; /* Configure the STRBl control register for 8-bit wide 

memory, 8-bit data size. */ 

*0x808068 = 0x00000; /* Allocate IK 8-bit words in the 8-bit memory pool. */ 

bufferl = malloc8(1024 * sizeof(int) ); /* Allocate 4K 8-bit floats in the 8-bit 

memory pool. */ 

buffer2 = malloc8(4096 * sizeof(float) ); /* Process buffers. */ 

callDSPoperation(bufferl, buffer2); 

/* Free buffers. */ 
free8(buffer2); 
free8(bufferl); 


I-1 

Note: 

The TMS320 floating-point C compiler s/zeoffunction returns 1 for both inte¬ 
ger and float data types. 

I_I 
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Example 4-2 allocates sections of the preceding code into the desired 
memory configuration. 


Example 4-2. Linker Command File 


sample.obj 


-heap8 32768 


-stack 8704 


-o sample.out 


-m sample.map 


MEMORY 

r 


i 

PRGRAM 

org 

STRBORAM 

org 

ONCHIRAM 

org 

STRBIRAM 

org 


} 

SECTIONS 

{ 


.text > PRGRAM 
.cinit > STRBORAM 
.const > STRBORAM 
.bss > STRBORAM 
.stack > STRBORAM 
. sysmS > STRBIRAM 
STRBl */ 

} 


/* Input filename 

/'^ Set 8-bit memory pool size. 

/* Set C system stack size. 

/* Specify output file. 

/'^ Specify map file. 


0x0000, 

Ten = 

0x2000 

0x2000, 

Ten = 

0x6000 

0x87Fe00, 

Ten = 

0x200 

0x900000, 

Ten = 

0x8000 


/* 32-bit data section 
/* 32-bit data section 
/* 32-bit data section 
/* 32-bit data section 
/* 32-bit data section 
/* 8-bit memory pool mapped to 


-k/ 
■k / 


■k / 
k / 


The debugger batch file shown in Example 4-3 executes initialization com¬ 
mands that configure the C source debugger to handle a ’C32 with the memory 
configuration shown in Figure 4-33 on page 4-75. 
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Example 4-3. Debugger Batch File 

mr 

sconfig init.clr 

; Define memory configuration. 

ma 0x0000, 0x2000, R|W|TX 

} 

Inform debugger that this section holds code 

ma 0x2000, 0x6000, RAM 

f 

(.text). 

No code here, STRBO 

ma 0x87FE00, 0x200, RAM 

} 

On-chip 

ma 0x808000, 0x10, RAM 

} 

Peripheral Bus Control - DMA 

ma 0x808020, 0x20, RAM 

f 

Peripheral Bus Control - Timers 

ma 0x808040, 0x10, RAM 

} 

Peripheral Bus Control - Serial Port 0 

ma 0x808060, 0x10, RAM 

r 

Peripheral Bus Control - External Memory Interface 

ma 0x900000, 0x8000, RAM 

} 

STRBl 

r 

reset 

map on 

r 

Make emulator aware of this memory configuration. 

} 

7*0x808064 = OxFOOOO 

f 

Set STRBO control register to 32-bit memory width. 


} 

32-bit data size. 

7*0x808068 = 0x00000 

f 

Set STRBl control register to 8-bit memory width. 


f 

8-bit data size. 

} 

load sample.out 

} 

Configure STRBO and STRBl control registers before 


f 

loading code. 


8-Bit Static Memory Aiiocation 

This section provides examples of 8-bit static buffer allocation and associated 
linker configuration. The debugger batch file is identical to the batch file in 
Example 4-3 and, therefore, is not shown. 

The C code in Example 4-4 demonstrates the static allocation of two buffers 
(1K and 4K 8-bit words) by defining a user section called .mydata8. This sec¬ 
tion is used to hold a structure consisting of two arrays of data values. 
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Example 4-4. 8-Bit Static Buffer Allocation 


#pragma DATA_SECTION(buffer8, ".mydataS") 
struct bufferStruct { 
in[1024]; 
out[4096]; 

} bufferS; 
void main() 

{ _ 

/* Configure the STRBO control register for 32-bit wide memory, 32-bit 
data size. 

*0x808064 = OxFOO OO; 

/* Configure the STRBl control register to 8-bit wide memory, 8-bit data 
size. */ 

*0x808068 = 0x00000; 

/* Process buffers. */ 

callDSPoperation(buffer8.in, buffer8.out); 


The linker command file in Example 4-5 allocates sections of the above C 
code into the desired memory configuration. 

Example 4-5. Linker Command File 


sample.obj 



/* 

Input 

filename 

*/ 

-stack 8704 



/* 

Set C 

system stack size. 

*/ 

-o sample.out 



/* 

Specify output file. 

*/ 

-m sample.map 



/* 

Specify map file. 

*/ 

MEMORY 

r 







i 

PRGRAM 


org = 

0x0000, 

len = 0x2000 


STRBORAM 


org = 

0x2000, 

len = 0x6000 


ONCHIRAM 


org = 

0x87Fe00, 

len = 0x200 


STRBlRAM 

i 


org = 

0x900000, 

len = 0x8000 


; 

SECTIONS 

{ 

.text > 







PRGRAM 


/* 

32-bit 

data section 

*/ 

.cinit > 

STRBORAM 


/* 

32-bit 

data section 

*/ 

.const > 

STRBORAM 


/* 

32-bit 

data section 

*/ 

. bss > 

STRBORAM 


/* 

32-bit 

data section 

*/ 

. stack > 

STRBORAM 


/* 

32-bit 

data section 

*/ 

.mydataS 

} 

> STRBlRAM 


/* 

8-bit 

memory pool mapped to STRBl 

*/ 
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4.7.5.2 Single External Memory Bank 

Consider the case of a typical audio compression application written in C that 
requires 32-bit data for the system stack and 16-bit data for the audio buffers. 
In this case, the programmer can interface the ’C32, as shown in Figure 4-34. 
This example assumes 32K 32-bit words of external memory. This memory is 
further defined as containing 8.5K 32-bit words of stack and 8K 32-bit words 
of program space; both areas are mapped to STRBO (program space includes 
constants and global/static variables). Also, external memory contains 32K 
16-bit word data buffers that are mapped into STRB1. 

Due to this mapping, the programmer must set the following: 

□ STRBO control register physical memory width to 32 bits and the data type 
size to 32 bits 


□ STRB configuration bit field to 1 (STRBO control register = 002F0000h) 

□ STRB1 control register physical memory width to 32 bits and the data type 
size to 16 bits, that is, STRB1 control register = OOODOOOOh 

Additionally, the PRGW pin must be pulled low to indicate 32-bit program 
memory width. 
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Figure 4-34. Zero-Wait-State Interface for 32-Bit SRAMs with 16- and 32-Bit Data 
Accesses 


TMS320C32 ^ -32-bit-wide memory banks-► 



The external memory address pins A-14A-13...A-1A0 are mapped to the ’C32’s 
A22A13A12-A1A0 pins. This mapping was selected to position the system 
stack immediately after the ’C32’s internal RAM. Performance is improved be¬ 
cause the top of the stack resides in internal RAM, and the stack is allowed to 
grow into external RAM. With this mapping, external memory accesses in the 
range 4000h through 7FFFh read or write 16-bit data; memory accesses in the 
range Oh through 3FFFh read or write 32-bit data. The PRGW pin controls the 
program fetches. 

Figure 4-35 shows the contents of external memory. Because of the address 
shift of the ’C32’s external memory interface, the memory map for the ’C32 
CPU is slightly different (see Figure 4-36). 
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Figure 4-35. External Memory Map 


address Contents 


Oh 




System stack area 

1FFFh 

(8K X 32 bits) 

2000h 

Program word 0 


Program word 1 



3FFFh 

Program word 8191 

4000h 

Datal 

DataO 

4001 h 

Data3 

Data2 




7FFFh 

Data32767 

Data32766 


Note: For 32-bit data, physical address = logical address. 

For 16-bit data, physical address = logical address shifted left by 1. 
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Figure 4-36. TMS320C32 Memory Map 

Logical 
address 

Oh 


2000h 


SFFFh 

4000h 


87FE00h 


87FFFFh 

880000h 


881FFFh 


OOOOOOh 


907FFFh 


FFFFFFh 



Note: For 32-bit data, physical address = logical address. 

For 16-bit data, physical address = logical address shifted left by 1. 
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16-Bit Dynamic Memory Aiiocation 

This section contains C code examples of 16-bit dynamic buffer allocation, 
linker configuration, and a debugger batch file. 

The following C code demonstrates the allocation of two buffers (1K and 4K, 
16-bit words) using the 16-bit dynamic memory allocation routines provided 
by the runtime-support library. 

Example 4-6. 16-Bit Dynamic Buffer Allocation 


# include <bus30.h> 
void main() 

{ 

int ^bufferl; 

float ^buffer2; _ _ _ 

/* Configure the STRBO control register to STRBO and STRBl overlay. */ 
/* 32-bit wide memory, 32-bit data size 
/'^ If using the PRTS30 headers, 

BUS_ADDR->STRBO_gcontrol = STRB0_1_CNFG | MEMW_32 | DATA_32; */ 
^0x808064 = 0 x2F00 00; 

/* Configure STRBl control register to 32-bit wide memory, 16-bit data 
size. */ 

/* If using the PRTS30 headers, 

BUS_ADDR->STRBl_gcontrol = MEMW_32 | DATA_16; */ 

*0x808068 = OxDOOOO; 

/* Allocate IK 16-bit words in the 16-bit memory pool. */ 
bufferl = mallocl6(1024 * sizeof(int) ); 

/* Allocate 4K 16-bit floats in the 16-bit memory pool. */ 
buffer2 = mallocl6(4096 * sizeof(float)); 

/* Process buffers. */ 
callDSPoperation (bufferl, buffer2); 

/* Free buffers. */ 
freel6(buffer2); 
freel6(bufferl); 


The linker command file in Example 4-7 allocates sections of the preceding 
C code into the memory configuration depicted in Figure 4-35 on page 4-82. 
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Example 4-7. Linker Command File 


sample.obj 


/-k 

Input filename 

-k / 

-heapl6 32768 


/* 

Set 16-blt memory pool size. 


-stack 8704 


/-k 

Set C system stack size. 

-k / 

-o sample.out 


/■k 

Specify output file. 

■k / 

-m sample.map 

MEMORY 

{ 

STRBORAM 



Specify map file. 


org = 

0x2000, len = 0x2000 


STACKRAM 

org = 

0x87Fe00, len = 0x2200 


STRBIRAM 

org = 

0x900000, len = 0x8000 



} 

SECTIONS 


{ 

.text > 

STRBORAM 


32-blt 

data 

section 



.clnlt > 

STRBORAM 

/-k 

32-blt 

data 

section 

-k / 


.const > 

STRBORAM 


32-blt 

data 

section 



. bss > 

STRBORAM 

/-k 

32-blt 

data 

section 

-k / 


. stack > 

STACKRAM 

/■k 

32-blt 

data 

section 

■k / 


.sysml6 

> STRBIRAM 


16-blt 

memory pool mapped to STRBl 



} 


The debugger batch file in Example 4-8 executes initialization commands that 
configure the C source debugger to handle a ’C32 with the memory configura¬ 
tion shown in Figure 4-36 on page 4-83. 

Example 4-8. Debugger Batch File 


mr 

sconfig init.clr 

; Define memory configuration. 


ma 0x2000, 0x2000, R|W|TX 

; Inform debugger that this section holds code 

( .text) . 

ma 0x87FE00, 0x2000, RAM 
ma 0x900000, 0x8000, RAM 
map on 

; Make emulator aware of this 

memory configuration. 

7*0x808064 = 0x2F0000 

; Set STRBO control register to 

STRBO and STRBl 


; overlay. 

; 32-bit memory width, 32-bit 

data size 

7*0x808068 = OxDOOOO 

r 

; Set STRBl control register. 



; 32-bit memory width, 16-bit 

data size 

load sample.out 

r 

; Configure STRBO/STRBl control 

registers before 


loading code. 
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4.8 Booting a TMS320C32 Target System in a C Environment 

A DSP system uses a boot procedure following power-up or reset to initialize 
the system volatile memory (such as SRAM) with the application program/data 
and to start execution of the application code. The SRAM loads from a nonvol¬ 
atile medium (EPROM) or from a PC development platform using a debugger/ 
loader program. The loader uses an emulator cable to move the load file from 
the PC hard disk to the SRAM on the DSP target board. An EPROM boot 
causes the DSP to start program execution directly from 16- or 32-bit EPROM 
(microprocessor mode). A hard-wired on-chip boot loader program copies the 
boot table from the 8-bit EPROM to internal or external SRAM and then starts 
execution from the SRAM (microcomputer/boot loader mode). 

Tl supports four ways to boot a DSP system following power-up/reset. Each 
boot procedure uses a different combination of ’C32 silicon features, software, 
and hardware tools. Each combination forms an integrated development envi¬ 
ronment that includes features to support most system boot requirements. 

A boot development flow includes two major tasks: 

1 ) Use C source debugger and assembly level tools to compile, assemble 
and link the boot code/data to create a binary common object file format 
(COFF) executable object. 

2 ) Load the COFF file into the DSP target system. 

Generating the COFF file (linker output .out file) uses the same flow for all boot 
methods. 

4.8.1 Generating a COFF File 

Generating a COFF file requires compiling the source code with the C compil¬ 
er, then assembling and linking the resulting assembly files, with the assembly 
level tools. A text editor creates additional assembly files or the files are ex¬ 
tracted from the RTS30 library. The linking process resolves all external refer¬ 
ences between program files and generates the .out COFF file subject to spe¬ 
cified options (such as -c or -cr boot options). 
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4.8.1.1 Compiler 


Figure 4-37 on page 4-89 shows how one or more C files are compiled into 
multiple assembly files. Each assembly file is constructed from former C func¬ 
tions that were individually decomposed into standard logical sections: 

□ The program code is assigned to .text. 

□ The stack is assigned to .stack. 

□ Dynamically allocated memory is assigned to .sysmem. 

□ The switch tables are assigned to .const. 

□ Uninitialized variables are assigned to .bss. 

□ initialized variables are assigned to .cinit. 

If, following system reset, the program executes directly out of EPROM (micro¬ 
processor mode), a separate assembly file holds the reset vector (and possi¬ 
bly other interrupt vectors). The reset vector points to the address contained 
in the cJntOO symbol that the linker resolves with the beginning of the 
BOOT.ASM routine (from the RTS30 library). 


4.8.1.2 Assembler 


The assembler assembles all .asm files into their respective .obj files. Since 
each .asm file may have a .text section fragment for each function in the file, 
its .obj counterpart groups all the fragments into a single .text section. This ap¬ 
plies to all sections in that file. The results of the assembler process are multi¬ 
ple .obj files composed of single instances of all standard C sections. In addi¬ 
tion to the object files generated by the user, the subsequent boot procedures 
require another .obj file. The boot.asm file can be extracted from the RTS30 
library and assembled separately into boot.obj. The boot.obj is the first routine 
executed following reset. It initializes the C environment by setting up the sys¬ 
tem stack, processing initialized variables, setting up the page pointer, and 
calling the main function. While boot.asm file is required for a C program, other 
files may be extracted from the library, such as malloc.asm, which is used to 
allocate additional memory at run time. 
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4.8.1.3 Linker 


The linker assigns physical addresses to logical program sections from .obj 
files. A linker command file defines the available physical memory segments 
using the MEMORY directive, assigns one or more sections to individual 
memory segments using the SECTIONS directive, and lists all object files con¬ 
taining sections to be processed. The order in which object files are listed is 
important and reflects the order in which individual sections are stacked in 
physical memory. For that reason, the boot.obj file must always be the first one 
listed, since it represents the execution entry point for every C program. The 
boot.obj global symbol cJntOO provides the entry address that can be resolved 
to other files that are linked with boot.obj (for example, the vector file that needs 
an address for the reset vector). Depending on the method, the linker can be 
invoked with the -c or -cr option. These two options control how a C program’s 
initialized variables are handled during the later stages of the boot process. 
See the TMS320C3x/C4x Assembly Language Tools User’s Guide for more 
information. 
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Figure 4-37. Compile, Assemble, and Link Flow 
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4.8.1.4 The .out (COFF) File 

After resolving the external references among all program sections, the linker 
builds the .out file. The .out file is constructed in the binary COFF format, and 
it contains all the sections listed in the linker SECTIONS directive. It contains 
information about the program, information about how to load it into the target 
DSP system, and symbol information for the debugger that is later used to 
verify the code. All C and assembly symbols, such as subroutine labels, etc., 
can be made visible in the debugger window (by embedding them in the COFF 
file), provided that they are declared as global symbols and the appropriate op¬ 
tions are used with the code generation tools. 

Some .out sections contain only the starting addresses and no code or data. 
They include the .stack section for the system stack, the .sysmem section for 
dynamically allocated memory, and the .bss section for uninitialized data. The 
boot process also uses the .bss section as a destination for the initialized vari¬ 
ables that are originally stored in the .cinit section of the .out file. Although they 
contain no data, the .stack and .sysmem sections are included In .out to allow 
the debugger tools to verify that the physical memory for those sections exists 
on the target board. Other sections in the COFF file, such as .vectors, .const, 
and .text, contain the starting addresses and the contents of the sections. 
When the debugger loads the .text section into the target system, for example, 
the opcodes for all assembly instructions for the entire program are copied, be¬ 
ginning at the section starting address. 

The .cinit section is different because it contains initialized variables. Once the 
.out file is generated, it can be burned into a 16- or 32-bit-wide EPROM, and 
the program can start executing directly from that EPROM following reset (in 
the microprocessor mode). But if the initialized variables reside in the same 
EPROM, they are not really variables, since one cannot write to an EPROM 
device and actually change the values of those variables. For that reason, be¬ 
fore user program execution begins, the boot.asm library routine copies the 
initialized variables from the EPROM .cinit section to the SRAM .bss section, 
one array of data at a time. Figure 4-37 on page 4-89 shows that the .cinit sec¬ 
tion is divided into individual array records; each array has a length, data con¬ 
tent, and destination address in the SRAM .bss section. The .bss section is the 
final destination for initialized variables, while the .cinit EPROM section is a 
temporary holding place for use before power-up/reset. The .cinit section also 
stores the -c/-cr linker option selection for use in the later stages of the boot 
process. 
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4.8.2 Loading the COFF File to the Target System 

When the COFF file is loaded into the DSP target system, program and data 
content, as well as control information, are extracted. Then the control infor¬ 
mation is used to place the program/data content in target memory. Some con¬ 
trol information embedded in the COFF file may not apply directly to the pro¬ 
gram/data content. For example, the COFF file may include a symbol table for 
the debugger or a memory width control word for the on-chip boot loader. 

Using the debugger to load the COFF file to target memory requires connect¬ 
ing the target board to the PC (on which the debugger is running) with an emu¬ 
lator cable and pod and then transferring the COFF file with the LOAD com¬ 
mand. The linker-c/-cr options control processing of the .cinit section during 
the load operation. 

The COFF file can also be loaded to a target system from an EPROM. The 
Hex30 utility converts the COFF file to an EPROM-programmer-compatible 
file that can be programmed to the EPROM. In the microprocessor mode, the 
program executes directly from the EPROM. In the microcontroller/boot loader 
mode, the on-chip boot loader first expands the EPROM contents into target 
SRAM and the program executes from SRAM. In either case, the C program 
begins execution at the start of the boot.asm library routine to initialize the C 
environment before the rest of the C program runs. 

4.8.3 Debugger Boot 

Figure 4-38 on page 4-93 and Figure 4-39 on page 4-94 show how to load 
the COFF file into the target system using the debugger load command. 

The debugger is a standard Tl software development tool that runs on a PC 
platform. The debugger accesses the target board through the PC emulator 
card and cable. The cable connects to the target board through a 12-pin con¬ 
nector that routes the signals to the DSP’s emulation pins. The emulation pins 
control the operation of the modular port scan device (MPSD) scan chain in 
the processor. Depending on the command issued by the debugger, the 
emulation circuitry in the scan chain stops or resumes processor operation, 
examines/loads registers or memory, sets breakpoints, or executes code one 
instruction at a time (called single-step execution). The debugger LOAD com¬ 
mand reads the COFF file from the PC hard drive, extracts program/data con¬ 
tent, and transfers it through the emulator cable to the target board’s memory. 
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4.8.3.1 RAM Model (Linker -cr Option) 

When the COFF file is loaded into the target board’s memory, most sections 
in the file are processed by copying the program/data to the address defined 
at the beginning of each section; however, the initialized variables in the .cinit 
section are processed differently. If the COFF file is generated by the linker us¬ 
ing a-cr option, the .cinit section of the file is loaded using the RAM model (see 
Figure 4-38). The RAM model assumes that the target memory is composed 
exclusively of SRAM devices. Thus, the initialized variables can be directly co¬ 
pied to the SRAM .bss section, one array at a time, without first placing them 
in a temporary EPROM .cinit section. Once the initialized variables have been 
loaded into SRAM, they can be read or written to by the CPU without further 
initialization steps by boot.asm at the beginning of C program execution. 

4.8.3.2 ROM Model (Linker -c Option) 

If the COFF file is created with the linker-c option, the loader places the .cinit 
section in the target memory according to the ROM model. The ROM model 
copies the .cinit section as one block to the address specified at the beginning 
of the same .cinit section. Following the load operation, the ROM model 
expects the boot.asm routine (at the beginning of the C program) to further 
process the .cinit section by copying its contents to the SRAM .bss section, one 
array at a time. After the COFF load operation, the memory content is the same 
as that created by the RAM model with one exception: the target SRAM still 
contains the temporary .cinit section, which serves no purpose after it is 
processed by boot.asm. The ROM model can still be useful; for example, it is 
useful to simulate the microprocessor-mode EPROM boot (see Figure 4-39). 
During the development cycle, instead of burning a new EPROM each time the 
code is modified, the EPROM can be removed and replaced with an equivalent 
SRAM device (by reconfiguring jumpers). The ROM model allows use of the 
loader to quickly load and debug the modified code while preserving the bus 
activity at power up to simulate an EPROM boot. 
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Figure 4-38. Loading C Object Fiie into TMS320C32 Memory (Linker -or Option) 
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Figure 4-39. Loading C Object Fiie into TMS320C32 Memory (Linker -c Option) 
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4.8.4 EPROM Boot 


Booting a DSP target board from C code stored in nonvolatile memory and ac¬ 
cessible to the DSP can be done In two ways. If the DSP is powered up in the 
microprocessor mode, the reset causes the program to start execution from 
32- or 16-bit EPROM by fetching the reset vector from memory address 
OOOOOOh and branching to the reset interrupt service routine (ISR) pointed to 
by that vector. 

On the other hand, if the DSP is powered up in the microcomputer/boot loader 
mode, program execution starts with the on-chip boot loader program. The 
boot loader reads the COFF file from an 8-bit EPROM and expands it to the 
system SRAM from which it can be executed (16 or 32 bits wide). In either 
case, program entry occurs at the beginning of the boot.asm library routine to 
initialize the C environment prior to execution of the C code. 

4.8.4.1 Microprocessor Mode (Linker -c Option) 

Before the binary COFF file can be burned into an EPROM, it must be con¬ 
verted to an ASCII format that an EPROM programmer can recognize (see 
Figure 4-40 on page 4-97). The hex conversion utility converts COFF files to 
a programmer object file format such as Intel™ Hex. The EPROM programmer 
uses the converted files to program one or more EPROMs that can be inserted 
into the DSP target board. 

If the linker -c option is used to create the COFF file (ROM model), the hex 
utility copies the .cinit section directly into the programmer object file without 
processing its content. In other words, the .cinit section in the programmed 
EPROM contains the initialized data as well as destination addresses and 
lengths in .bss for individual .cinit data arrays. To start program execution from 
EPROM at power up, the DSP must be configured in the microprocessor mode 
by pulling the MCBL/MP pin low. Triggered by the low-to-high transition of the 
RESET pin, the DSP executes the reset vector fetch read cycle. The reset vec¬ 
tor points to the boot.asm routine, which is executed next. The linker-c option 
sets a control bit in the .cinit section of the COFF file. 

When the boot.asm program executes the .cinit section, it checks the -c/-cr 
control bit. The -c option (ROM model) causes boot.asm to copy the contents 
of each array within the .cinit section to its destination in the .bss section 
mapped to SRAM. The initialized variables must be copied from EPROM to 
SRAM at the beginning of program execution, because they cannot be modi¬ 
fied in EPROM (variable data must be changeable during program execution). 
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4.8.4.2 Microcomputer/Boot Loader Mode (Linker -or Option) 

The ’C32 features an on-chip hardwired boot loader program in the internal 
programmable logic array (PLA). The boot loader reduces the DSP target 
board cost by replacing multiple fast EPROMs with a single 8-bit slow (inex¬ 
pensive) EPROM. Because the ’C32 cannot execute code from memory that 
is only 8 bits wide, the on-chip boot loader program reads the boot table from 
the byte-wide EPROM and reconstructs all sections of the original COFF file 
one byte at a time before placing the program/data in SRAM (see Figure 4-41 
on page 4-98). 

To power up the DSP in the boot loader mode, the MCBL/MP pin must be held 
high when the RESET signal is deasserted. At that stage, the DSP starts 
executing the boot loader code from internal address 000045h. Immediately 
after it starts execution, the boot loader checks the interrupt flag (IF) register. 
All interrupts are disabled and remain disabled until the application program 
enables them. Depending on which external interrupt is asserted, the boot 
loader looks for the boot table at one of three external memory locations or at 
the serial port. The interrupt pins carry a message to the boot loader telling it 
where to get the boot table after reset. 

The boot table structure resembles the COFF file from which it was derived by 
the hex conversion utility. The main feature that distinguishes the boot table 
from a regular hex utility output (such as the microprocessor mode boot exam¬ 
ple) is that in addition to the contents of the COFF sections, the boot table in¬ 
cludes special control words for the on-chip boot loader program to instruct it 
on how to assemble and load those sections. Each section is built into a block 
preceded by three control words: block size, destination address, and destina¬ 
tion memory width/data size. Multiple blocks can be transferred to selected 
parts of the DSP memory map. To format the COFF file into the boot table, the 
program section to be booted must be identified to the hex conversion utility 
with the SECTIONS directive. The boot table is constructed of the COFF sec¬ 
tions identified in the SECTIONS directive and marked with the boot option 
(see Figure 4-41). 

If the linker uses the -cr option to create the COFF file, the hex utility processes 
the COFF .cinit section and assigns the addresses in the .bss section to the 
corresponding .cinit arrays in the boot table. Every C program starts execution 
with the boot.asm routine, but because one of the boot.asm control flags indi¬ 
cates that the COFF file was created with the linker -cr option, the code skips 
transfer of .cinit contents to .bss. The hex utility performs that task by placing 
all the initialized variables in .bss while creating the boot table without relying 
on boot.asm to make the transfer at run time (see Figure 4-41). 
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Figure 4-40. 32-Bit EPROM Boot in the Microprocessor Mode (Linker -c Option) 
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Figure 4-41. 8-Bit EPROM Boot Using the On-Chip Boot Loader (Linker -or Option) 
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4.8.5 Boot Table Memory Considerations 

There is a significant difference in the methods of interfacing the external 
memory holding the boot table and the program/data memory used during nor¬ 
mal code execution. The address presented on the ’C32’s pins may be shifted 
by one or two bits, depending on the size of the memory bank (see 
Figure 4-42), but the external memory holding the boot table must have no ad¬ 
dress shift relative to the ’C32 address pins, regardless of the width of the boot 
memory (see Figure 4-43). The boot loader program reads the boot table 
memory width from the first word of the boot table. It reads the boot table con¬ 
tents as 32-bit data, and, depending on the memory width, it reconstructs the 
program and data before sending them to the memory map. Because of this 
difference in the address shift, the byte-wide EPROM containing the boot table 
is not best suited to store normal data unless special hardware is added to han¬ 
dle the address shift. 
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Figure 4-42. Memory Configuration for Normal Program Execution 
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Figure 4-43. Boot Table Memory Configuration 




A14 Data 

□ -□ 

A14 Data 

A13 

□ -□ 

A13 

• •• 

emory 

□ -□ 

□ -□ 

• •• 

emory 

A2 ^ 

□ -□ 

A2 ^ 

A1 

□ -□ 

A1 

AO CS 

□ -□ 

AO CS 


Memory bank 
32 bits wide 


O Memory bank 
16 bits wide 


u u u u u 



O Memory bank 
8 bits wide 


Note: For external memory used during normal program execution, the amount of external address shift depends only on the 
width of the memory bank. 


Memory Interfacing 


4-101 




























































Booting a TMS320C32 Target System in a C Environment 


4.8.6 Host Load 


While some DSP systems stand alone, others may be embedded DSPs con¬ 
trolled by a host, such as a microcontroller or another DSP. During system 
power up, the DSP boot table may be transferred from the host to the DSP 
through a serial port or through a byte-wide latch. This eliminates the need for 
a dedicated boot EPROM on the DSP side of the system. On the host side, the 
DSP boot table may be temporarily stored in an EPROM, prior to the DSP boot. 
Following reset, the host transfers the boot table to the DSP to initialize it and 
start program execution. 

4.8.6.1 Boot From Serial Port 

If the DSP powers up in the microcomputer/boot loader mode (MCB1_/MP 
high), the low on the INT3 pin and high on all other INTx pins causes the on- 
chip boot loader program to read the boot table from the serial port. Most mi¬ 
crocontrollers also feature a serial port, and in many cases the two ports can 
be connected directly without additional glue logic for an economical host/DSP 
interface. Following the boot, the serial channel can also be used by the host 
to send/receive data and to control the operation of the DSP (see Figure 4-44 
on page 4-104). Generating the boot table requires linking the object files with 
the -or option (RAM model) and then appending the hex utility’s SECTIONS 
directive with the boot keyword to identify the COFF sections to be included 
in the boot table. 

4.8.6.2 Boot From a Latch 

If the host processor does not have a serial port, the DSP can be booted from 
the host using an 8-bit latch. During the boot operation, the host feeds the boot 
table bytes to the latch on one side, while the DSP reads the data from the oth¬ 
er. Following reset, interrupts 0,1, and 2 direct the DSP boot loader to the latch 
address. The same interrupts cause the boot loader to read from the parallel 
port, so some control/decode logic is required to make the DSP read from 
memory instead of from a latch. The same glue logic must also be connected 
to the host side of the latch to ensure proper data-transfer synchronization be¬ 
tween two asynchronous systems (see Figure 4-45 on page 4-105). At power 
up, the DSP boot table most likely resides in the host’s EPROM, and the host 
outputs the boot table to the latch one byte at a time following reset. Creating 
the boot table for this operation uses the same linker/COFF options as for the 
host/serial boot and the direct EPROM boot. 
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4.8.6.3 Asynchronous Boot From a Communications Port 

If the host processor has an asynchronous communications capability, then 
the ’C32 can make a glueless connection to the host’s communication port 
(see Figure 4-46 on page 4-106). In addition to the data bus, three ’C32 pins 
are involved in the asynchronous boot: XFO, XF1, and lACK. The XF1 pin 
serves as the data ready input to the ’C32, and XFO is the data acknowledge. 

The lACK pin pulses when there is no valid data present on the data lines 
(which are needed for the ’C4x comm-port interface). For boot loader mode, 
it is assumed that the host (such as a ’C4x) connects directly to the data ready 
and data acknowledge control lines. The host drives the data ready signal low 
to indicate to the DSP that the next byte of the boot table has been placed on 
the data lines. The DSP responds by pulling the data acknowledge signal low 
after reading the data. When the host sees the data acknowledge signal, it 
stops driving the data bus and brings the data ready line high. To complete the 
handshaking transaction, the DSP brings the data acknowledge signal high to 
request the next byte from the host. The boot table for this type of boot opera¬ 
tion is created with the linker-cr option (RAM model) and hex conversion utility 
SECTIONS directive boot keyword — the same options used for other boot 
load procedures involving the on-chip boot loader program. 
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Figure 4-44. Boot From Host Using Serial Port (Linker -or Option) 
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4.9 TMS320C30 Addressing up to 68 Gigawords 

The ’C30 primary bus has 24 address lines which allow addressing up to 
16 megawords of memory. The ’C30 expansion bus has 13 address lines 
addressing 8K words. These two busses, expansion bus address lines 
[XA(12-0)] and the primary lines [A(23-0)], can be used simultaneously to 
extend the address to 36 bits. This is accomplished by using the feature of the 
’C3x family that holds the past address bits on an external bus until a new 
external access occurs. That means, the address bus works as a latch. 
Figure 4-47 shows how these two busses are combined together. The 
following parallel instruction accomplishes this task: 

STI Rx,^ARn ; address MSTRB while loading a 

; value from STRB memory 

II LDI ^ARp,Rq 

where: 

Rx and Rq designate registers RO to R7 (but not the same register) 
ARn and ARp designate auxiliary registers ARO to AR7 (but not the same 
register). 

I-1 

Note: 

ARn contains the 8-Mword segment address plus 800000h. ARp contains 
the address within the 8-Mword segment and is between 0 and 7FFFFFh. 

I_I 

Figure 4-47. TMS320C30 Combination of Primary and Expansion Busses to Address 68 
Gigawords 
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Chapter 5 


Programming Tips 


Programming style reflects personal preference. The purpose of this chapter 
is not to impose any particular style, but to highlight features of the ’C3x that 
can produce faster and/or shorter programs. The tips cover the C compiler, as¬ 
sembly language programming, and low-power mode wakeup. 
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5.1 Hints for Optimizing C Code 

The ’C3x was designed with a large register file, software stack, and memory 
space that easily supports the floating point C compiler. The C compiler trans¬ 
lates ANSI C programs into assembly language source code. It also increases 
code portability and decreases application porting time. 

After writing your application in C language, debug the program and determine 
whether it runs efficiently. If the program does not run efficiently: 

□ Use the optimizer with -o2 or -o3 options when compiling 

□ Use registers to pass parameters (-ms compiling option) 

□ Use inlining (-x compiling option) 

□ Remove the -g option when compiling 

□ Follow some of the efficient code generation tips listed below 

Identify places where most of the execution time is spent and optimize these areas 
by writing assembly language routines that implement the functions. Call the rou¬ 
tines from the C program as C functions. 

The efficiency of the code generated by the floating-point compiler depends 
to a large extent on the compiler options used when writing your C code. There 
are specific constructs that can vastly improve the compiler’s effectiveness: 

□ Use register variables for often-used variables. This is particularly true 
for pointer variables. Example 5-1 shows a code fragment that ex¬ 
changes one object in memory with another. 

Example 5-1. Exchanging Objects in Memory 


register float 

^src, ’^dest, temp 

do 

{ 



temp = 

^++src; 


^src = 

^++dest; 

i 

^dest = 

t emp; 

j 

while 

{—n) ; 



□ Precompute subexpressions. This especially applies to array refer¬ 
ences in loops. Assign commonly used expressions to register variables, 
where possible. 

□ Use *++ to step through arrays rather than using an index to recalculate 
the address each time through a loop. 
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As an example of the previous two points, consider the loops in Example 5-2. 
Example 5-2. Optimizing a Loop 


loop 1 
main () 

{ 

float a[10], b[10]; 
int i; 

for (1 = 0; 1 < 10; ++i) 

a[i] = (a[i] ^ 20) + b[i]; 

} 

loop 2 
main () 

{ 

float a[10], b[10]; 
int 1; 

register float ^p = a, = b; 

for (1 = 0; 1 < 10; ++i) 

^p++ = (^p ^ 20) + "^q+t; 

} 


Loop 1 executes in 19 cycles. Loop 2, which is the equivalent of loop 1, exe¬ 
cutes in 12 cycles. 

□ Use structure assignments to copy blocks of data. The compiler gen¬ 
erates very efficient code for structure assignments, so nest objects within 
structures and use simple assignments to copy them. 

□ Avoid large local frames and declare the most often used local vari¬ 
ables first. The compiler uses indirect addressing with an 8-bit offset to 
access local data. To access objects on the local frame with offsets greater 
than 255, the compiler must first load the offset into an index register. This 
requires one extra instruction and incurs two cycles of pipeline delay. 
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□ Avoid the large model. The large model is inefficient because the compil¬ 
er reloads the data-page pointer (DP) before each access to a global or 
static variable. If you have large array objects, use malloc() to dynamically 
allocate them and access them via pointers rather than declaring them 
globally. Example 5-3 illustrates two methods for allocating large array 
objects. 

Example 5-3. Allocating Large Array Objects 


/* Inefficient Method */ 

int a[1000000]l; /* Inefficient */ 

a[i] = 10; 

/* Efficient Method */ 

int *a = (init *)malloc(1000000) ; /* Efficient */; 

a[i] = 10; 
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5.2 Hints for Assembly Coding 

Each program has unique requirements. Not all possible optimizations are 

appropriate in every case. You can use the suggestions in this section as a 

checklist of available software tools. 

□ Use delayed branches. Delayed branches execute in a single cycle; reg¬ 
ular branches execute in four cycles. The next three instructions are exe¬ 
cuted whether the branch is taken or not. If fewer than three instructions 
are required, use the delayed branch and append No-operation instruc¬ 
tions (NOPs). A reduction in machine cycles still occurs. 

□ Apply the repeat single/block construct. I n this way, loops are achieved 
with no overhead. Nesting such constructs does not normally increase 
efficiency, so try to use the feature on the most often performed loop. Note 
that the RPTS instruction is not interruptible and the executed instruction 
is not refetched for execution. This frees the buses for operand fetches. 

□ Use parallel instructions. It is possible to perform a multiply in parallel 
with an add (or subtract) and to execute stores in parallel with any multiply 
or arithmetic logic unit (ALU) operation. This increases the number of 
operations executed in a single cycle. For maximum efficiency, observe 
the addressing modes used in parallel instructions and arrange the data 
appropriately. It is possible to have loads in parallel with any multiply or add 
(or subtract) by multiplying by 1 or adding a 0. Therefore, to implement 
parallel instructions with a data load, substitute a multiply or an add 
instruction with one extra register containing 1 or 0, respectively, in place 
of a load instruction. 

□ Maximize the use of registers. The registers are an efficient way to 
access scratch-pad memory. Extensive use of the register file facilitates 
the use of parallel instructions and helps avoid pipeline conflicts when you 
use the registers in addressing modes. 

□ Use the cache. This is especially important in conjunction with slow exter¬ 
nal memory. The cache is transparent to the user, so make sure that it is 
enabled. 

□ Use internal memory instead of external memory. The internal 
memory (2K x 32 bits RAM and 4K x 32 bits ROM) is considerably faster 
to access. In a single cycle, two operands can be brought from internal 
memory. You can maximize performance if you use the direct memory ac¬ 
cess (DMA) in parallel with the CPU to transfer data to internal memory 
before you operate on it. 

□ Avoid pipeline conflicts. For time-critical operations, make sure you do 
not miss any cycles because of pipeline conflicts. 
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The preceding checklist is not exhaustive, and it does not address the detailed 
features outlined in other chapters of this manual. To learn how to exploit the 
full power of the ’C3x, study the architecture, hardware configuration, and 
instruction set of the device described in the TMS320C3x User’s Guide. 
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5.3 Low-Power Mode Wakeup Example 

There are two instructions by which the ’C31, ’LC31, and ’C32 are placed in 
the low-power consumption mode: 

□ IDLE2 

□ LOPOWER 

The LOPOWER instruction slows down the H1/H3 clock by a factor of 16 dur¬ 
ing the read phase of the instruction. The MAXSPEED instruction wakes the 
device from the low-power mode and returns it to full frequency during 
MAXSPEED’s read cycle. However, the H1/H3 clock may resume in the phase 
opposite to the one it was in before the clocks were shut down. 

The IDLE2 instruction has the same functions that the IDLE instruction has, 
except that the clock Is stopped during the execute phase of the IDLE2 instruc¬ 
tion. The clock pin stops with HI high and H3 low. The status of all the signals 
remains the same as in the execute phase of the IDLE2 instruction. In emula¬ 
tion mode, however, the clocks continue to run, and IDLE2 operates identically 
to IDLE. The external interrupts INT(0-3) are the only signals that start up the 
processor from the mode the device was in. Therefore, you must enable the 
external Interrupt before going to IDLE2 power-down mode (see 
Example 5-4). If the proper external interrupt is not set up before executing 
IDLE2 to power down, the only way to wake up the processor is with a device 
reset. 
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Example 5-4. Setup of IDLE2 Power-Down Mode Wakeup 


-k 

* TITLE IDLE2 POWER-DOWN MODE WAKEUP ROUTINE SETUP 


* THIS EXAMPLE SETS UP THE EXTERNAL INTERRUPT 0, INTO, BEEORE 

* EXECUTING THE IDLE2 INSTRUCTION. WHEN THE INTO SIGNAL IS RECEIVED 

* LATER, THE PROCESSOR WILL RESUME EROM ITS PREVIOUS 

* STATE. NOTE: THE "INTRPT" SECTION IS MAPPED FROM THE 

* ADDRESS 0 FROM THE RESET AND INTERRUPT VECTORS. 



. sect 

"INTRPT" 




RESET 

. word 

START 

} 

Reset 

. vector 


INTO 

. word 

INT0_ 

ISR ; 

INTO 

interrupt 

vector 

INTI 

. word 

INT1_ 

ISR ; 

INTI 

interrupt 

vector 

INT2 

. word 

INT2_ 

ISR ; 

INT2 

interrupt 

vector 

INT3 

. word 

INT3_ 

ISR ; 

INT3 

interrupt 

vector 


. text 


LDP @SP_ADR 
LDI @SP_ADR,SP 

OR Olh, IE 

IDLE2 


Set up stack pointer 
Enable INTO 

Set GIE = 1 and stop clock 


INT0_ISR RETI 


Return to instruction after IDLE2#define N 16 


There is one cycle of delay while waking up the processor from the IDLE2 
power-down mode before the clocks start up. This adds one extra cycle from 
the time the interrupt pin goes low until the interrupt is taken. The interrupt pin 
needs to be low for at least two cycles. The clocks may start up In the phase 
opposite the phase that they were in before the clocks were stopped. 
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5.4 Bit-Reversed Addressing in C 

The C language does not have any construct to take advantage of the bit- 
reversed addressing feature of the ’C3x. To take advantage of this feature, 
Figure 5-1 shows the assembly instructions added to the C code to use bit- 
reversed addressing. 

Figure 5-1. Bit-Reversed Addressing in C Code 


#define N 16 

int x[N] = { 0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15 } ; 
int y[N] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; 
int bitrev(int m, intn); 

void main() 

{ 

int i; 


asm (" 

PUSH 

AR5") ; 





asm (" 

PUSH 

ARO") ; 





asm (" 

LDI 

8,IR0; ; 

Initialize IRO 

TO 

1/2 N"); 

asm (" 

LDI 

@CONST+0,AR5 

AR5 

<- address 

of 

X[] "); 

asm (" 

LDI 

@CONST+l,AR0 ; 

ARO 

<- address 

of 

Y[] "); 

for ( 

i=0; i<n; 

i + + ) { 






y[bitrev( 

i,N) ] = x[i]; */ 





asin(" LDI 

*AR5++(IRO)b. 

RO") 

} 



asm(" STI 

RO, *AR0++") ; 





asm (" 

POP ARO" 

) ; 





asm (" 

i 

POP AR5" 

) ; 





; 

/* These 

statements place x 

and 

y in .bss . 

and 

make their 


addresses available via the CONST table. 

V 


asm (" 

. bss 

CONST, 2 "); 





asm (" 

. sect 

\".cinit\" "); 





asm (" 

. word 

2, CONST "); 





asm (" 

. word 

_x "); 





asm (" 

. word 

_Y "); 
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5.5 Sharing Header Files in C and Assembly 

Sometimes it is usefui to be abie to define named constants that can be used 
in both C and assembly language. 

One method is to have separate header files that define the same symbols: 
a C include file with #define directives and an assembler include file with .set 
or .asg directives. However, it is more convenient to have a single, shared 
header file that defines symbols once for C and assembly. 

Figure 5-2 shows how a file can be used normally as a C include file and also 
to generate an assembler include file. By compiling it and defining ASMDEFS, 
an assembler include file is generated from this file with the following com¬ 
mand: 

cl30 -dASMDEFS -k defs.h 

Figure 5-2. Input File defs.h 


#define PI 3.14 
#define E 2.72 

#ifdef ASMDEFS IF DEFINED, CREATE .asg DIRECTIVES 
#define ASM_ASG(sym) asm("\t.asg\t" VAL(sym) #sym 

#define VAL(sym) #sym 

ASM_ASG(PI); 

ASM_ASG(E); 

#endif /^ASMDEFS^/ 


The output is the file defs.asm, which contains .asg directives for your symbols 
(see Figure 5-3). 

Figure 5-3. Output File defs.asm 


; ... <compiler-generated header stuff> ... 

.asg 3.14,PI 
.asg 2.72,E 


You can then use .include in your assembly modules. The same technique can 
be used to create .set directives rather than .asg. 
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Addressing Peripherals as Data Structures in C 


5.6 Addressing Peripherals as Data Structures in C 

A data structure is usually assigned to the .bss section by the C compiler. A 
■bss section stores global and statically allocated variables. A peripheral, such 
as a serial port, has memory-mapped control registers with addresses differ¬ 
ent from .bss. To manipulate a memory-mapped peripheral register in C, follow 
one of the methods listed below. 


□ Method 1 : Use a pointer to the peripheral. 


Pointer 


Address = 0x808000 


Peripheral as memory locations 


1) Declare a structure that logically represents the memory locations of 
the peripheral. 

struct controller { 

unsigned int status; 


2) Declare a pointer to the structure and initialize it to the peripheral’s ad¬ 
dress. 

struct controller ^IFperipheral = (struct controller 0x808000; 

3) In your code, access the peripheral’s memory values indirectly. 

IFperipheral->status = 0; 

□ Method 2: Place the structure in its own section. 

1) Declare a peripheral instead of a pointer. 

struct controller IFperiph; 

2) Use inline assembly to give the structure its own section. 

asm("_IFperiph .usect \"periph\", 128); 

/* 128 is size of struct */ 

This creates a user-defined section that can be linked to any ad¬ 
dress. 

3) Use your linker command file to map the section to memory. 

periph: load = 0x808000 

4) Address the structure elements directly. 

IFperiph.status = 0; 
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Addressing Peripherals as Data Structures in C 


Method 1 is very useful for addressing peripheral or memory buffers that are 
device specific. Method 2 is preferred for addressing peripherals or memory 
buffers which are not device specific (that is, peripherals are user specified). 
This method ensures the task of mapping and aligning user-specific peripher¬ 
als and/or memory buffers to the linker. The choice depends on your individual 
application. 

See section 5.7 for another method of placing the structure in its own section 
using #pragma directives. 
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Linking C Data Objects Separate From the .bss Section 


5.7 Linking C Data Objects Separate From the .bss Section 

The TMS320 DSP C compilers produce several relocatable blocks of code 
and data when C code is compiled. These blocks are called sections and can 
be allocated into memory in a variety of ways to conform to a variety of system 
configurations. The .bss section is used by the compiler for global and static 
variables; it is one of the default GOFF sections that is used to reserve a speci¬ 
fied amount of space in the memory map that can later be used for storing data. 
It is normally unitialized. All global and static variables in a C program are 
placed in the .bss section. For example, on the floating-point DSPs, you might 
want to link all of your variables into off-chip memory but place a frequently- 
used array in on-chip RAM block 0. 

□ Method A: Declare variable in a separate section. 

1) Declare the variable that is to be separated from the .bss section in a 
separate file. For example, declare a 32-word array, tapDelay [ ], in a 
file called array.c as follows: 

/* File: ARRAY.C */ 
int tapDelay[32] 

/* End of file */ 

2) Declare the variable as extern in any file that makes a reference to it. 
Consider the following file, teste, that makes a reference to the array 
declared in file array.c as follows: 

/* File: TEST.C */ 

extern int tapDelay[ ] ; 

void main(void) 

{ 

int i; 

tapDelay[i] = 0; 

} 

/* End of file */ 
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Unking C Data Objects Separate From the .bss Section 


3) In the linker command file, link this variable separate from the .bss 
section in the SECTIONS section. The following linker command file 
segment illustrates how to link the array tapDelay [ ] onto the ’C3x on- 
chip, dual-access data RAM block 0 while linking the rest of the global 
and static variables into part of a similar data RAM block 1: 

/* File: TEST.CMD */ 

test.obj 
array.obj 

MEMORY 

{ 

RAMBO: origin = 0x809800, length = 0x400 

RAMBl: origin = 0x809c00, length = 0x400 


SECTIONS 

{ 

.bss :{} >RAMB1 

tapdelayline : {array.obj (.bss) } > RAMBO 


} 

/'^ End of file '^/ 


□ Method B: Declare variable in a #pragma DATA_SECTION. 

1) Declare the variable that is to be separated from the .bss section in a 
#pragma DATA_SECTION. Consider the example described in Meth¬ 
od A. The following code segment uses the DATA_SECTION pragma 
to declare a 32-word array, tapDelay [ ], that is placed separate from 
the other global and static variables: 

/* File: TEST.C */ 

#pragma DATA_SECTION (tapDelay, ".tapdelayline") 
int tapDelay[32]; 


void main(void) 

{ 

int i; 

tapDelay[i] = 0; 

} 

End of file 
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Unking C Data Objects Separate From the .bss Section 


2) In the linker command file, use the section name .tapdelayline to place 
the array tapDelay [ ] in RAM block 0. Separate it from the other global 
and static variables that are in the .bss section as follows: 

/* File: TEST.CMD */ 

test.obj 
array.obj 

MEMORY 

{ 

EXTO: origin = 0x100, len = 0x3f00 

RAMO: origin = 0x809800, len = 0x400 

} 

SECTIONS 

{ 

.bss : {} EXTO 

.tapdelayline : {} RAMO 


} 

End of file 

Method B is available in the floating-point DSP C compiler version 4.60 or 
greater. It is described in the TMS320 Floating-Point DSP Code Generation 
Tools Release 4.70 Getting Started Guide. 
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Interrupts in C 


5.8 Interrupts in C 

To use interrupts in C, you must write an interrupt service routine (ISR), initial¬ 
ize the interrupt vector table, and link these parts with the linker command file. 
These steps are described below. 

Step 1 : Write a C language interrupt service routine (ISR). 

The C compiler requires that each ISR be named as follows: 

void c_intOn(void) /* n is the int number */ 

{ 

/* a C function that is an ISR */ 

} 

The interrupt routine must not return a value and has no arguments. 
The C compiler recognizes this naming convention and treats it as 
a normal ISR. This means it performs a context save of the neces¬ 
sary registers and returns from the routine via an RETI instruction. 

A good practice is to include the interrupts in a separate file called 
ints.c or something similar. This allows a modular style, simpler 
maintenance, and software that is easy to understand. 

Step 2: Initialize the interrupt vector table using either C or assembly lan¬ 
guage. 

In microprocessor mode of ’C30 and ’C31, the first 0x40 addresses 
are reserved for the interrupt and trap vectors. Address 0 (zero) 
holds the address of the reset routine. If using the -C linker option, 
the RTS30.lib function boot.asm takes care of defining the reset 
function, but the vector table initialization is left to the user. 

An assembly language routine might look like this: 

; file name is vectors.asm 

; .sect "vectors" ; a new section begins here 
.word _c_int00 ; the address of the reset 

vector 

.word _c_int01 ; the ISR for interrupt 0 

.word _c_int02 ; the ISR for interrupt 1 

; etc. 

; end 

This routine creates a new section that is merely a list of addresses 
where the interrupt routines can be found. It can be written in C by 
encapsulating each line in an asm statement. 

For example: 

asm(" .sect \"vectors\" "); 

A C function that is an ISR. 
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Step 3: Link the interrupt service routine (ISR) and the initialized interrupt 
vector table with the linker command file. 

The linker command file provides the mechanism for including the 
vectors.asm object and the ints.c object. 

file name == mylink.cmd '^ / 
vectors.obj 
ints.obj 

The MEMORY section needs to identify the location of the int vec¬ 
tors. 

MEMORY 

{ 

VECTORS: origin = Oh, length = 40h 

} 

The SECTIONS section needs to map the user-defined section 
called vectors to the memory location. 

SECTIONS 

{ vectors: > VECTORS 
} 
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Chapter 6 


DSP Algorithms 


Certain features of the ’C3x architecture and instruction set facilitate the solu¬ 
tion of numerically intensive problems. This chapter presents examples of 
applications using these features, such as companding, filtering, fast Fourier 
transforms (FFTs), and matrix arithmetic. 
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Companding 


6.1 Companding 


In telecommunications, conserving channel bandwidth while preserving 
speech quality is a primary concern. This is achieved by quantizing the speech 
samples logarithmically. An 8-bit logarithmic quantizer produces speech quali¬ 
ty equivalent to a 13-bit uniform quantizer. The logarithmic quantization is 
achieved by companding (COMpress/exPANDing). Two international stan¬ 
dards have been established for companding: the p-law standard (used in the 
United States and Japan), and the A-law standard (used in Europe). Detailed 
descriptions of p law and A law companding are included in Volume 1 of the 
book Digital Signal Processing Applications With the TMS320 Family. 

During transmission, logarithmically compressed data in sign-magnitude form 
is transmitted along the communications channel. If any processing is neces¬ 
sary, you must expand this data to a 14-bit (for p law) or 13-bit (for A law) linear 
format. This operation is performed when the data is received at the digital sig¬ 
nal processor (DSP). After processing, the result is compressed back to 8-bit 
format and transmitted through the channel to continue transmission. 

Example 6-1 and Example 6-2 show p-law compression and expansion (that 
is, linear to p-law and p-law to linear conversion), while Example 6-3 and 
Example 6-4 show A-law compression and expansion. For expansion, using 
a look-up table is an alternative approach. A look-up table trades memory 
space for speed of execution. Since the compressed data is eight bits long, you 
can construct a table with 256 entries containing the expanded data. If the 
compressed data is stored in the register ARO, the following two instructions 
put the expanded data in register RO: 

ADDI @TABL,AR0 ; @TABL = BASE ADDRESS OF TABLE 

LDI*AR0,R0 ; PUT EXPANDED NUMBER IN RO 

You could use the same look-up table approach for compression, but the re¬ 
quired table length would be 16384 words for p-law and 8192 words for A-law. 
If this memory size is not acceptable, use the subroutines presented in 
Example 6-1 or Example 6-3. 
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Companding 


Example 6-1. p-Law Compression 


•k 

-k 

TITLE U±LAW 

COMPRESSION 




■k 

SUBROUTINE MUCMPR 





ARGUMENT AS SIGNMENT S: 





ARGUMENT | 

FUNCTION 




-k 












■k 

-k 

RO I 

NUMBER TO 

BE CONVERTED 


■k 

REGISTERS USED AS INPUT 

RO 



* 

REGISTERS MODIFIED: RO, 

Rl, 

GO 


■k 

-k 

REGISTER CONTAINING RESULT: 

RO 


■k 

NOTE: SINCE 

THE STACK POINTER 'SP' IS USED IN THE COMPRESSION 

:k 

ROUTINE 'MUCMPR', 

MAKE SURE TO INITIALIZE IT IN 

THE 

■k 

■k 

CALLING PROGRAM. 




■k 

-k 

CYCLES: 20 

WORDS: 

17 



■k 

.global MUCMPR 




MUCMPR LDI 

RO, R1 

r 

Save sign of number 



ABSI 

o 

o 





CMP I 

IFDEH,RO 

} 

If R0>0xlFDE, 



LDIGT 

1FDEH,R0 

r 

saturate the result 



ADD I 

33,RO 

} 

Add bias 



FLOAT 

RO 

} 

Normalize: (seg+5)OWXYZx 

. . . X 


MPYF 

0.03125,RO 

r 

Adjust segment number by 

2^*(±5) 


LSH 

1,R0 

} 

(seg)WXYZx...x 



PUSHF 

RO 





POP 

RO 

} 

Treat number as integer 



LSH 

±20,RO 

} 

Right-justify 



LDI 

0,R2 





LDI 

Rl, R1 

r 

If number is negative. 



LDILT 

80H,R2 

} 

set sign bit 



ADD I 

o 

CM 

QC 

r 

RO = compressed number 



NOT 

RO 

r 

Reverse all bits for transmission 


RETS 
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Example 6-2. p-Law Expansion 


* TITLE U-LAW EXPANSION 

* SUBROUTINE MUXPND 

* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

* - + - 

* RO I NUMBER TO BE CONVERTED 

* REGISTERS USED AS INPUT: RO 

* REGISTERS MODIFIED: RO, Rl, R2, SP 

* REGISTER CONTAINING RESULT: RO 

* CYCLES: 20 (WORST CASE) WORDS: 14 


.global MUXPND 

■k 


MUXPND NOT 

R0,R0 

Complement bits 


LDI 

R0,R1 



AND 

0FH,R1 ; 

Isolate quantization 

bin 

LSH 

1, R1 



ADD I 

33,R1 ; 

Add bias to introduce 

: IXXXXI 

LDI 

R0,R2 ; 

Store for sign bit 


LSH 

±4,R0 



AND 

7,R0 

Isolate segment code 


LSH3 

R0,R1,R0 ; 

Shift and put result 

in RO 

SUBI 

33,RO ; 

Subtract bias 


TSTB 

80H,R2 ; 

Test sign bit 


RETSZ 




NEGI 

RO ; 

Negate if a negative 

number 

RETS 
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Example 6-3. A-Law Compression 

-k 

TITLE A±LAW COMPRESSION 




SUBROUTINE ACMPR 



■k 

ARGUMENT AS SIGNMENT S: 




ARGUMENT | FUNCTION 



-k 










:k 

RO 1 NUMBER TO BE CONVERTED 


REGISTERS USED AS INPUT: RO 



■k 

REGISTERS MODIFIED: RO, Rl, 

R2, 

SP 

■k 

REGISTER CONTAINING RESULT: 

RO 


■k 

NOTE: SINCE THE 

STACK POINTER 

'SP' IS USED IN THE COMPRESSION 

:k 

ROUTINE 'ACMPR', 

MAKE SURE 

TO INITIALIZE IT IN THE 

:k 

CALLING PROGRAM. 




:k 

CYCLES:22 WORDS: 

19 




.global ACMPR 




ACMPR LDI 

R0,R1 

r 

Save sign of number 


ABSI 

R0,R0 




CMP I 

1FH,R0 

r 

If R0<0x20, 


BLED 

END 

r 

do linear coding 


CMP I 

OFFFH,RO 

} 

If R0>0xFFF, 


LDIGT 

0FFFH,R0 

r 

saturate the result 


LSH 

±1,R0 

} 

Eliminate rightmost bit 


FLOAT 

RO 

} 

Normalize: (seg+3)OWXYZx...x 


MPYF 

0.125,R0 

r 

Adjust segment number by 2** (i3) 


LSH 

1,R0 

} 

(seg)WXYZx...x 


PUSHF 

RO 




POP 

RO 

} 

Treat number as integer 


LSH 

±20,RO 

r 

Right±justify 

END 

LDI 

0,R2 




LDI 

Rl, R1 

} 

If number is negative. 


LDILT 

80H,R2 

r 

set sign bit 


ADD I 

R2, RO 

} 

RO = compressed number 


XOR 

0D5H,R0 

r 

Invert even bits 




} 

for transmission 

•k 

RETS 
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Example 6-4. A-Law Expansion 


■k 

TITLE A-LAW EXPANSION 



■k 

SUBROUTINE AXPND 





ARGUMENT 

ASSIGNMENTS: 



■k 

ARGUMENT 

1 FUNCTION 



-k 














■k 

RO 


1 NUMBER 

TO 

BE CONVERTED 


■k 

REGISTERS 

USED AS INPUT 

: RO 



REGISTERS 

MODIFIED: 

RO, 

Rl, R2, SP 


■k 

k: 

REGISTER CONTAINING 

RESULT: RO 


•k 

-k 


CYCLES 

: 25 (WORST CASE)WORDS: 16 


k: 


.global AXPND 




AXPND 

XOR 

D5H,R0 

r 

Invert even bits 




LDI 

RO, R1 






AND 

0FH,R1 

f 

Isolate quantization 

bin 



LSH 

1,R1 






LDI 

R0,R2 

f 

Store for bit sign 




LSH 

±4,R0 






AND 

7,R0 

f 

Isolate segment code 




BZ 

SKIPl 






SUBI 

1,R0 






ADD I 

32,R1 

f 

Create Ixxxxl 


SKIPl 

ADD I 

1, R1 

} 

OR Oxxxxl 




LSH3 

R0,R1,R0 

f 

Shift and put result 

in RO 



TSTB 

RETSZ 

80H,R2 

} 

Test sign bit 




NEGI 

RETS 

RO 

} 

Negate if a negative 

number 
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FIR, HR, and Adaptive Filters 


6.2 FIR, MR, and Adaptive Filters 

Digital filters are a common requirement for DSPs. There are two types of digi¬ 
tal filters: finite impulse response (FIR) and infinite impulse response (MR). 
Both of these types can have either fixed or adaptable coefficients. This sec¬ 
tion presents the fixed-coefficient filters first, followed by the adaptive filters. 


6.2.1 FIR Filters 


If the FIR filter has an impulse response h [0], h [1 ],..., h [N -1 ], and x [n] repre¬ 
sents the input of the filter at time n, the output y [n] at time n is given by this 
equation: 

y [n] = h [0] X [n] -H h [1] X [n -1] + ... + h [N -1] X [n - (N -1)] 

Two features of the ’C3x that facilitate the implementation of the FIR filters are 
parallel multiply/add operations and circular addressing. The former permits 
the performance of a multiplication and an addition in a single machine cycle, 
while the latter makes a finite buffer of length N sufficient for the data x. 

Figure 6-1 shows the arrangement of memory locations necessary to imple¬ 
ment circular addressing, while Example 6-5 presents the ’C3x assembly 
code for an FIR filter. 

Figure 6-1. Data Memory Organization for an FIR Filter 



Impulse 

Initial 

Final 

I n\A/ 

response 

input samples 

input samples 

l_\J vv 

address 

h(N-1) 

Oldest input 

x[n-{N-1)] 


x{n) 



h(N-2) 


x[n - (N - 2)] 


x[n-{N-1)] 



• 

• 

• 


High 

address 


Circular 


• 


V 

V 

h(1) 


x{n-1) 


x(n-2) 


h(0) 

Newest input 

x(n) 


x(n-1) 



To set up circular addressing, initialize the block-size register BK to block 
length N. Start the locations for signal x from a memory location whose ad¬ 
dress is a multiple of the smallest power of 2 that is greater than N. For 
instance, if N = 24, the first address for x is a multiple of 32 (the lowest five 
bits of the beginning address are 0). See the Circular Addressing sec\i\on in the 
Addressing chapter of the TMS320C3x User’s Guide for more information. 
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FIR, HR, and Adaptive Filters 


In Example 6-5, the pointer to the input sequence x is incremented and is as¬ 
sumed to be moving from an older input to a newer input. At the end of the sub¬ 
routine, AR1 points to the position for the next input sample. 


Example 6-5. FIR Filter 


* TITLE FIR FILTER 
SUBROUTINE FIR 

* EQUATION: y(n) = h(0) * x(n) + h(l) ^ x(n±l) + 

* ... + h(N±l) ^ x(n±(N±l)) 

* TYPICAL CALLING SEQUENCE: 

* LOAD ARO 

* LOAD ARl 

* LOAD RC 

* LOAD BK 

CALL FIR 

* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

^ - + - 

* ARO I ADDRESS OF h(N±l) 

* ARl I ADDRESS OF x(n-(N±l)) 

* RC I LENGTH OF FILTER ± 2 (N±2) 

* BK I LENGTH OF FILTER (N) 

* REGISTERS USED AS INPUT: ARO, ARl, RC, BK 

* REGISTERS MODIFIED: RO, R2, ARO, ARl, RC 

* REGISTER CONTAINING RESULT: RO 

* CYCLES: 11 + (N±l) WORDS: 6 

.global FIR 

* ; Initialize RO: 

FIR MPYF3 *AR0++(1) ,*AR1++(1)%, RO 

* ; h(N±l) * x(n±(N±l)) ±> RO 

LDF 0.0,R2 ; Initialize R2 

* 

* FILTER (1 <= i < N) 
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Example 6-5. 

FIR Filter (Continued) 


I I 

RPTS 

MPYF3 

ADDF3 

RC 

*AR0++(1),*AR1++(1)%,R0 
RO,R2, R2 

; Set up the repeat cycle 

; h(N±l±i)(n±(N±l±i))±>R0 

; Multiply and add operation 

-k 

■k 

ADDF 

O 

o 

; Add last product 

k: 

RETURN SEQUENCE 


■k 

-k 

RETS 


; Return 

k: 

end 



k: 

. end 




6.2.2 HR Filters 


The transfer function of the 11R filters has both poles and Os. Its output depends 
on both the input and the past output. As a rule, the MR filters need less com¬ 
putation than an FIR with similar frequency response, but the filters have the 
drawback of being sensitive to coefficient quantization. Most often, the MR fil¬ 
ters are implemented as a cascade of second-order sections, called biquads. 
Example 6-6 shows the implementation for one biquad. 

This is the equation for a single biquad: 

y [n] = a1 y [n - 1 ] -i- a2 y [n - 2] -i- bO x [n ] + b1 x [n -1 ] + b2 x [n - 2] 

However, the following two equations are more convenient and have smaller 
storage requirements: 

d [n] = a2 d [n - 2] -i- a1 d [n -1] h- x [n] 

y [n] = b2 d [n - 2] + b1 d [n - 1 ] h- bO d [n] 

Figure 6-2 shows the memory organization for this two-equation approach, 
and Example 6-7 shows the implementation for any number of biquads. 
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Figure 6-2. Data Memory Organization for a Single Biquad 


Low 

address 


High 

address 


Filter 

coefficients 


a2 


b2 


a1 


b1 


bO 


Newest delay 


Oldest delay 


Newest delay 
node values 


Newest delay 
node values 


d(n) 


d{n-1) 

d(n-1) 


d(n-2) 

d{n-2) 


d{n) 


Circular queue 


As in the case of FIR filters, the address for the start of the d values must be 
a multiple of 4; that is, the last two bits of the beginning address must be 0. The 
block-size register BK must be initialized to 3. 


Example 6-6. HR Filter (One Biquad) 


TITLE HR FILTER 

SUBROUTINE HR 1 

IIRl == HR FILTER (ONE BIQUAD) 

EQUATIONS: d(n) = a2 * d(n±2) + al * d(n±l) + x(n) 

y(n) = b2 d(n±2) + bl * d(n±l) + bO * d(n) 

OR y(n) = al*y(n+l) + a2*y(n+2) + b0*x(n) 

+ bl*x(n±l) + b2*x(n+2) 

TYPICAL CALLING SEQUENCE: 


load 

load 


R2 

ARO 


load ARl 
load BK 
CALL IIRl 


* ARGUMENT AS SIGNMENT S: 

* ARGUMENT | FUNCTION 

-k -^- 


R2 

ARO 

ARl 

BK 


I INPUT SAMPLE X(N) 

I ADDRESS OF FILTER COEFFICIENTS (A2) 

I ADDRESS OF DELAY MODE VALUES (D(N±2)) 
I BK = 3 
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Example 6-6. HR Filter (One Biquad) (Continued) 


•k 

REGISTERS 

USED AS INPUT: R2 , ARO, ARl, BK 

■k 

REGISTERS 

MODIEIED: RO, Rl, R2 , ARO, 

ARl 

:k 

■k 

REGISTER CONTAINING RESULT: RO 


:k 

■k 

CYCLES: 11 

WORDS: 8 


:k 

EILTER 



■k 

■k 

.global IIRl 


IIRl MPYF3 

*AR0, ’^ARl, RO 


-k 

MPYF3 

*++AR0 (1) , *AR1 - (1) % ,R1 

; a2 * d(n±2) ±> RO 

■k 

■k 



; b2 * d(n±2) ±> Rl 


MPYF3 

*++AR0(1) , *AR1,R0 

; al * d(n±l) ±> RO 

I I 

ADDF3 

RO,R2, R2 

; a2*d(n+2)+x(n) +> R2 

■k 

MPYF3 

*++AR0(1) , *AR1 - (1)%,R0 

; bl * d(n±l) ±> RO 

I I 

ADDF3 

R0,R2,R2 

; al*d{n±l)+a2*d{n±2)+x{n) ±> R2 


MPYF3 

*++AR0(1),R2,R2 

; bO * d(n) ±> R2 

I I 

■k 

STF 

R2,*AR1++(1)% 


■k 

-k 



; Store d(n)and point to d(n+l) 


ADDF 

R0,R2 

; bl*d(n±l)+bO*d{n) ±> R2 


ADDF 

Rl, R2,RO 

; b2*d(n±2)+bl*d(n±l) 

-k 



; +bO*d(n) ±> RO 

■k 

* 

RETURN 

SEQUENCE 



RETS 


; Return 

■k 

end 




. end 
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In the more general case, the MR filter contains N>1 biquads. The equations 
for its implementation are given by the following pseudo-C language code: 

y [0,n] = X [n] 

for (i = 0; i < N; i ++){ 

d [i,n] = a2 [i] d [i, n - 2] + al [i] d [i,n -1] + y [i - l,n] 
y [i,n] = b2 [i] d [i - 2] + bl [i] d [i,n - 1] + bO [i] d [i,n] 

} 

y [n] = y [N - l,n] 

Figure 6-3 shows the corresponding memory organization, while Example 6-7 
shows the ’C3x assembly-language code. 


Figure 6-3. Data Memory Organization for N Biquads 


Low 

address 


High 

address 


Filter 

coefficients 


a2(N-1) 


b2(N-1) 


a1(N-1) 


b1(N-1) 


b0(N-1) 


Initial delay 
node values 


Final delay 
node values 


a2{0) 

Newest delay 

d(0, n) 


d(0, n-1) 

b2(0) 


d(0, n-1) 


d(0, n-2) 

a1(0) 

Oldest delay 

d(0, n-2) 


d(0, n) 

b1(0) 

Empty 


Empty 

b0(0) 


• 

• 


Circular queue 


d(N -1, n) 


d(N -1, n-1) 

d(N -1, n-1) 


d(N -1, n-2) 

d(N -1, n-2) 


d(N -1, n) 

Empty 


Empty 


Circular queue 


You must initialize the block register BK to 3; the beginning of each set of d val¬ 
ues (that is, d [i,n ], i = 0 ... N -1) must be at an address that is a multiple of 
4 (where the last two bits are 0). 
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Example 6-7. HR Filters (N > 1 Biquads) 



TITLE HR 

EILTERS (N > 1 BIQUADS) 

•k 

-k 

SUBROUTINE IIR2 

■k 

:k 

EQUATIONS 

: y (0, n) = x (n) 

■k 

EOR (i = 

3 ; i < N ; i++) 

:k 

{ 


■k 

d (i, n) 

= a2(i) * d{i,n±2) + al(i) * d{i,n±I) * y(i±I,n) 

:k 

y (i, n) 

= b2(i) * d(i,n±2) + bl (i) * d(i,n±I) * b0(i) * d(i,n) 

■k 

TYPICAL CALLING SEQUENCE: 

:k 

} 


■k 

:k 

y{n) = y{N±l,n) 

■k 

■k 

TYPICAL CALLING SEQUENCE: 

■k 

load 

R2 


load 

ARO 


load 

ARl 


load 

IRO 

■k 

load 

IRl 

■k 

load 

BK 


load 

RC 

■k 

CALL 

IIR2 

■k 

ARGUMENT 

ASSIGNMENT: 


ARGUMENT 

1 EUNCTION 

-k 


1 



1 


R2 

1 INPUT SAMPLE x(n) 

■k 

ARC 

1 ADDRESS OE EILTER COEEEICIENTS (a2(0)) 

* 

ARl 

1 ADDRESS OE DELAY NODE VALUES (d(0,n±2)) 

■k 

BK 

1 BK = 3 


IRO 

1 IRO = 4 

■k 

IRl 

1 IRl = 4*N±4 

■k 

RC 

1 NUMBER OE BIQUADS (N) ±2 


REGISTERS 

USED AS INPUT; R2, ARO, ARl, IRO, IRl, BK, RC 

■k 

REGISTERS 

MODIEIED; RO, Rl, R2, ARO, ARl, RC 

:k 

REGISTERS 

CONTAINING RESULT: RO 
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Example 6-7. 

HR Filters (N > 1 Biquads) (Continued) 

* CYCLES: 17 + 6N WORDS: 17 


-k 

.global IIR2 


IIR2 

MPYF3 

’^ARO, *AR1, RO 


-k 



; a2 (0) * d{0,n±2) ±> RO 


MPYF3 

*AR0++(1), *AR1-(1)%, R1 


-k 



; b2(0) * d{0,n±2) ±> R1 

k: 

MPYF3 

*++AR0(1),*AR1,R0 

; al (0) * D(0,n±l) ±> RO 

1 1 

ADDF 

RO, R2, R2 

; First sum term of d(0,n) 


MPYF3 

*++AR0 (1) , ’^ARl-(1) %,R0 

} bl(0) * d(0,n±l) ±> RO 

1 1 

ADDF3 

RO, R2, R2 

; Second sum term of d(0,n) 


MPYF3 

’^T+ARO (1) , R2 

; bO(0) * d(0,n) ±> R2 

1 1 
■k 

STF 

R2, *AR1-(1)% 


kc 



Store d(0,n) ; 




point to; 




d(0,n±2) 

■k 

RPTB 

LOOP 

Loop for 1 <= i < n 


MPYF3 

*++AR0(1),*++ARl(IRO),RO 

a2 (1) * d(i,n±2) ±> RO 

1 1 
■k 

ADDF3 

R0,R2,R2 

First sum term of y(i+l,n) 


MPYF3 

*++AR0(1) ,*AR1 (1)%R1 

b2(i) * D(i,n±2) ±> Rl 

1 1 

ADDF3 

R1,R2,R2 

Second sum term 

-k 



of y(i±l,n) 


MPYF3 

*++AR0(1),*AR1,R0 

al(i) * d(i,n±l) ±> RO 

1 1 
■k 

ADDF3 

R0,R2,R2 

First sum of d(i,n) 


MPYF3 

*++AR0(1),*AR1-(1)%,R0 

bl(i) ^ d(i,n±l) ±> RO 

1 1 
■k 

ADDF3 

R0,R2,R2 

Second sum term of d(i,n) 


STF 

R2, *AR1-(1)% 


■k 



; Store d(i,n) ; 




; point to d(i,n±2) 

LOOP 

MPYF3 

*++AR0(l), R2,R2 


-k 

■k 



; bO (i) * d(i,n) ±> R2 

* FINAL SUMMATION 
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Example 6-7. HR Filters (N > 1 Biquads) (Continued) 



ADDF R0,R2 

; First sum term of y(n+l,n) 


ADDF3 R1,R2,R0 

; Second sum term 



; of y(n±l,n) 


NOP *AR1-(IRl) 

; Return to first biquad 

•k 

NOP *AR1-(1) % 

; Point to d(0,n±l) 

■k 

RETURN SEQUENCE 



RETS 

; Return 

■k 

:k 

end 



. end 



6.2.3 Adaptive Filters (Least Mean Squares Algorithm) 

In some applications in digital signal processing, you must adapt a filter over 
time to keep track of changing conditions. This is accomplished by adapting 
a coefficient to a filter and creating a new coefficient by means of a least mean 
squares (IMS) algorithm. The equations for this process are described below. 

The book Theory and Design of Adaptive Filters presents the theory of adap¬ 
tive filters. Although, in theory, both FIR and MR structures can be used as 
adaptive filters, the stability problems and the local optimum points that the MR 
filters exhibit make them less attractive for such an application. Hence, until 
further research makes NR filters a better choice, only the FIR filters are used 
in adaptive algorithms of practical applications. 

In an adaptive FIR filter, the filtering equation takes this form: 

y [n] = h [n,0] x [n] + h [n,1] x [n - 1] + ... + h [n,N - 1] x [n - (N - 1)] 

The filter coefficients are time-dependent and updated through IMS algo¬ 
rithms. In a IMS algorithm, the coefficients are updated by an equation in this 
form: 

h [n -I- 1 ,i] = h [n,i] -i- pc[n] x [n - i], i = 0,1,..., N - 1 

where c[n] = d[n] - y[n] p is a constant for the computation and d[n] is the de¬ 
sired signal. You can interleave the updating of the filter coefficients with the 
computation of the filter output so that it takes three cycles per filter tap to do 
both. The updated coefficients are written over the old filter coefficients. 
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Example 6-8 shows the implementation of an adaptive FIR filter on the ’C3x. 
The memory organization and the positioning of the data in memory follows 
the same rules that apply to the FIR filter described in section 6.2.1 on page 
6-7. 


Example 6-8. Adaptive FIR Filter (LMS Algorithm) 


} 

LMS == LMS ADAPTIVE FILTER 


r 

EQUATIONS: y(n) = h(n,0)*x(n) + h(n,1)(n±l) 

+ ...+ h(n,N±1)*x(n±(N±l)) 

} 

e (n) = d (n) - y (n) 


r 

for (i = 0; i < N; i++) 


} 

h(n+l,i) = h(n,i) + mu * e (n) * 

X (n±i) 

r 

TYPICAL CALLING SEQUENCE: 


} 

load R4 


r 

load ARO 


r 

load ARl 


r 

load AR6 


} 

load RC 


r 

load BK 


r 

CALL FIR 


r 

ARGUMENT AS SIGNMENT S: 


r 

ARGUMENT | FUNCTION 



1 


r 

1 


} 

R4 1 scale factor (2 mu * err) 


r 

ARO 1 address of h(n,N±l) 


} 

ARl 1 address of x(n+(N+l)) 


} 

AR6 1 address of d(n) 


r 

RC 1 length of filter ± 2 (N±l) 


r 

BK 1 length of filter (N) 


r 

REGISTERS USED AS INPUT: R4, ARO, ARl, RC, BK 


} 

REGISTERS MODIFIED: RO, Rl, R2, R5, ARO, ARl, 

RC 

r 

REGISTER CONTAINING RESULT: RO 


} 

PROGRAM SIZE: 11 words 


} 

EXECUTION CYCLES: 13 + 3N 


r 
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Example 6-8. Adaptive FIR Filter (LMS Algorithm) (Continued) 


} 

setup (i = 0) 





.text 




LMS 






Idf *ar6++,r5 

r 

Get desired sample 


mpyfS ^arO — %, ^arl++(1)%, rO 

} 

h(n,N-l) * x(n-(N-l)) -> RO 

I I 

subf r2. 

r2 , r2 

r 

init r2 




} 

Initialize RO: 


LMS MPYFS 

*AR0, *AR1, RO 






} 

h(n,N±l) ^ x(n±(N±l)) ±> RO 

■k 

LDF 

o 

o 

r 

Initialize R2 

k 



r 

Initialize Rl: 


MPYFS 

*AR1++(1)%, R4, R1 

r 

x(n±(N±l)) * tmuerr ±> Rl 


ADDFS 

*AR0++(1), Rl, R1 



k: 



} 

h(n,N±l) + x(n±(N±l)) ^ 

k: 

k 



} 

tmuerr +> Rl 

k 

k 

FILTER AND UPDATE (1 <= I < N) 



k 

RPTB 

LOOP 

r 

Set up the repeat block 

k 



} 

Eilter: 


MPYES 

*AR0-(1),*AR1,R0 

r 

h (n, N±l±i) 




} 

* x(n±(N±l±i)) ±> RO 

k 

1 1 ADDES 

R0,R2,R2 

r 

Multiply and add operation 

k 



r 

UPDATE: 


MPYES 

*AR1++(1)%, R4, R1 

} 

X(n,N+(N+l+i)) * tmuerr +> Rl 

k 

1 1 STE 

Rl,*AR0++(1) 

r 

Rl ±> h(n+1,N±l±(i±l)) 


LOOP 

ADDES *AR0++(1), Rl, 

R1 


k 



} 

h(n,N+l+i) + x(n+(N+l+i)) 

k 



r 

^tmuerr +> Rl 


ADDES 

RO, R2,RO 

} 

Add last product 


STE 

Rl,*±AR0(1) 

r 

h (n, 0) + X (n) 

k 



} 

* tmuerr +> h(n+l,0) 

k: 

RETURN SEQUENCE 



k 

k 

RETS 


} 

Return 

k 

k 

end 





. end 
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6.3 Lattice Filters 


The lattice form is an alternative way of implementing digital filters. It has found 
applications in speech processing, spectral estimation, and other areas. In this 
discussion, the notation and terminology from speech processing applications 
are used. 

If H(z) is the transfer function of a digital filter that has only poles, A(z) = 1/H(z) 
is a filter having only Os, and is called the inverse filter. The inverse lattice filter 
is shown in Figure 6-4. These equations describe the filter in mathematical 
terms: 

f (i,n) = f (I - 1,n) + k (I) b (i - 1,n - 1) 
b (l,n) = b(l-1,n-1) + k(i)f (i-1,n) 

Initial conditions: 
f (0,n) = b (0,n) = x (n) 

Final conditions: 
y (n) = f ( p,n) 

In the above equation, f (i,n) is the forward error, b (i,n) is the backward error, 
k (i) is the i-th reflection coefficient, x (n) is the input, and y (n) is the output 
signal. The order of the filter (that Is, the number of stages) is p. In the linear 
predictive coding (LPC) method of speech processing, the inverse lattice filter 
is used during analysis, and the (forward) lattice filter during speech synthesis. 

Figure 6-4. Structure of the Inverse Lattice Fiiter 



Figure 6-5 shows the data memory organization of the inverse lattice filter on 
the ’C3x. 


6-18 






















Lattice Filters 


Figure 6-5. Data Memory Organization for Forward and Inverse Lattice Filters 


Low 

address 


Reflection Backward 

coefficients propagation terms 


k(1) 


b(0, n-1) 

k(2) 


b(1, n-1) 














High 

address 


k(P) 


b(p -1, n-1) 


Example 6-9 shows the implementation of an inverse lattice filter. 


Example 6-9. Inverse Lattice Fiiter 


* TITLE INVERSE LATTICE FILTER 

* SUBROUTINE LATINV 

* LATINV == LATTICE EILTER (LPC INVERSE EILTER ± ANALYSIS) 

* TYPICAL CALLING SEQUENCE: 


* load R2 

* load ARO 

* load ARl 

* load RC 

* CALL LATINV 


* ARGUMENT AS SIGNMENT S: 


-k 

-k 

ARGUMENT 

1 

1 

EUNCTION 

■k 

R2 

T 

1 

f (0, n) = X 


ARO 

1 

ADDRESS OE 

•k 

ARl 

1 

ADDRESS OE 

■k 

RC 

1 

1 

VALUES 
RC = p ± 2 


-k 


(n) 

EILTER COEEEICIENTS (k(l)) 
BACKWARD PROPAGATION 
(b(0,n±l) ) 


* REGISTERS USED AS INPUT: R2, ARO, ARl, RC 

* REGISTERS MODIEIED: RO, Rl, R2, R3, RS, RE, RC, ARO, ARl 

* REGISTER CONTAINING RESULT: R2 (f(p,n)) 
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Example 6-9. 

Inverse Lattice Filter (Continued) 

■k 

-k 

PROGRAM SIZE: 10 WORDS 



■k 

k: 

EXECUTION CYCLES: 13 + 3 * (p±l) 



k: 

.global LATINV 



■k 

i = 1 




k: 





LATINV MPYF3 

*AR0, *AR1, RO 



* 



} 

k(l) ^ b(0,n±l) ±> RO 

■k 



r 

Assume f(0,n) ±> R2. 


LDF 

R2, R3 

} 

Put b(0,n) = f(0,n) ±> R3. 


MPYE3 

*AR0++(1),R2,R1 



* 

■k 



} 

k(l) ^ f(0,n) ±> Rl 

* 

IM 

A 

II 

H- 

A 

II 

^ P 



■k 

RPTB 

LOOP 




MPYE3 

*AR0,*++ARl(1) ,R0 

r 

k(i) * b(i±l,n±l) ±> RO 

1 1 

ADDE3 

R2, RO, R2 

} 

f(i±l±l,n)+k(i±l) 

■k 



f 

*b(i±l±l,n±l) 

■k 

-k 



r 

= f(i±l,n) ±> R2 

k: 



} 

b (i±l±l, b±l) +k (i±l) f (i±l±l, n) 


ADDE3 

’^±AR1(1), Rl, R3 

} 

= b(i±l,n) ±> R3 

1 1 
■k 

STE R3 

, *±AR1(1) 

r 

b(i±l±l,n) ±> b(i±l±l,n±l) 

LOOP MPYF3 

*AR0++(1),R2,R1 



kc 

■k 



} 

k(i) ^ f(i±l,n) ±> Rl 

* 

I = P+1 (CLEANUP) 




ADDE3 

R2, RO, R2 

} 

f(p±l,n)+k(p)*b(p±l,n±l) 

■k 

■k 



r 

= f (p, n) ±> R2 

k: 



} 

b(p±l,n±l)+k(p)*f(p+1,n) 


ADDE3 

*AR1, Rl, R3 

f 

= b (p, n) ±> R3 

1 1 

STE 

R3, *AR1 

f 

b(p±l,n) ±> b(p±l,n±l) 

■k 

k: 

RETURN SEQUENCE 




RETS 


r 

RETURN 

k: 

■k 

■k 

end 




. end 
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Figure 6-6 


The forward lattice filter is similar in structure to the inverse filter, as shown in 
Figure 6-6. 


Structure of the (Forward) Lattice Filter 

x(n) f(p-1, n) f(2, n) 


f{1,n) 


y{n) 



These corresponding equations describe the lattice filter: 

f (i - 1,n) = f (i,n) - k (i) b (i - 1,n - 1) 
b (i,n) = b (i - 1 ,n - 1) + k (i) f (i - 1 ,n) 

Initial conditions: 

f (p,n) = X (n), b (i,n - 1) = 0 for i = 1,p 

Final conditions: 
y(n)=f(0,n) 

The data memory organization is identical to that of the inverse filter, as shown 
in Figure 6-5 on page 6-19. Example 6-10 shows the implementation of the 
lattice filter on the ’C3x. 
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Example 6-10. 

Lattice Filter 

■k 

-k 

TITLE LATTICE FILTER 

■k 

SUBROUTINE LATICE 

■k 

LOAD 

ARO 


LOAD 

ARl 

■k 

LOAD 

RC 

k: 

■k 

CALL 

LATICE 

k: 

ARGUMENT 

ASSIGNMENTS: 

■k 

ARGUMENT 

1 FUNCTION 

-k 






■k 

R2 

1 F(P,N) = E(N) = EXCITATION 

* 

ARO 

1 ADDRESS OF FILTER COEFFICIENTS (K(P)) 

■k 

ARl 

1 ADDRESS OF BACKWARD PROPAGATION VALUES (B(P±1,N±1)) 


IRO 

1 3 

■k 

-k 

RC 

1 RC = P ± 3 

■k 

REGISTERS 

USED AS INPUT: R2, ARO, ARl, RC 

k: 

REGISTERS 

MODIFIED: RO, Rl, R2, R3, RS, RE, RC, ARO, ARl 

k: 

■k 

REGISTER 

CONTAINING RESULT: R2 (f(0,n)) 

■k 

-k 

STACK USAGE: NONE 

■k 

■k 

PROGRAM SIZE: 12 WORDS 

* 

k: 

EXECUTION 

CYCLES: 15 + 3 * (P±2) 

■k 

.global LATICE 

LATICE MPYF3 

*AR0,*AR1,R0 

■k 


; K(P) * B(P±1,N±1) ±> RO 



; Assume F(P,N) ±> R2 


SUBF3 

R0,R2,R2 ; F (P, N) ±K (P) *B (P±l, N±l) 



; = F(P±1,N) ±> R2 

I I 

MPYF3 

*—ARO(1),*—ARl(1),R0 



; K(P-l) * B(P±2,N±1) ±> RO 


SUBF3 

R0,R2,R2 ; F (P-1, N) ±K (P-1) *B(P±2,N±1) 



; = F(P±2,N) ±> R2 
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Example 6-10. Lattice Filter (Continued) 


11 

MPYF3 

*—^ARO {1) , *—ARl (1) , 

RO 




r 

K(P-2) * B(P-3,N-1) ±> RO 


MPYF3 

R2, ’^+AR0 (1) , R1 

} 

F(P-2,N) * K(P-l) ±> R1 


ADDF3 

Rl,*+ARl(1) ,R3 

f 

E(P±2,N) * K(P-l) + B(P±2,N-1) 




} 

= B(P-1,N) ±> R3 

-k 



f 

!<=!<= P-2 


RPTB 

LOOP 




SUBF3 

RO,R2, R2 

} 

E(I,N) - K(I) B(I-1,N-1) 




f 

= E(I-1,N) ±> R2 

I I 

MPYF3 

*-ARO (1) , *-ARl (1) , 

, RO 




} 

K(I-l) * B(I±2,N±1) ±> RO 


STF R3, 

’^TARl (IRO) 

} 

B(I+1,N) ±> B(I+1,N-1) 

I I 

MPYF3 

R2,*+AR0(1) ,R1 

f 

E (I-1,N) * K(I) ±> R1 

LOOP ADDF3 

Rl,*+ARl(1) ,R3 

f 

E(I-1,N) * K(I) + B(I-1,N-1) 




} 

= B(I,N) ±> R3 


STF 

R3,*+ARl(2) 

r 

B(1,N) ±> B(1,N±1) 


STF 

R2,*+ARl(1) 

} 

E(0,N) ±> B(0,N±1) 

k 

k 

RETURN SEQUENCE 



k 

RETS 




k 

k 

END 





. end 





DSP Algorithms 


6-23 














Matrix-Vector Multiplication 


6.4 Matrix-Vector Multiplication 

In matrix-vector multiplication, a K x N matrix of elements m(i,j) having K rows 
and N columns is multiplied by an N x 1 vector to produce a K x 1 result. The 
multiplier vector has elements v(j), and the product vector has elements p(i). 
Each one of the product-vector elements is computed by the following expres¬ 
sion: 

p (i) = m (i,0) V (0) + m (i,1) v (1) + ... + m (i,N - 1) v (N - 1) i = 0,1,...,K- 1 

This is essentially a dot product, and the matrix-vector multiplication contains, 
as a special case, the dot product presented in Example 2-1 on page 2-3. In 
pseudo-C format, the computation of the matrix multiplication is expressed by: 

for (i = 0; i < K; i -I- -I-) { 
p (i) = 0 

for(j = 0; j < N;j -I--I-) 

p (i) = p (i) + m (i,j) * V (j) 

} 

Figure 6-7 shows the data memory organization for matrix-vector multiplica¬ 
tion, and Example 6-11 shows the ’C3x assembly code that implements it. 
Note that in Example 6-11, K (number of rows) must be greater than 0 and N 
(number of columns) must be greater than 1. 

Figure 6-7. Data Memory Organization for Matrix-Vector Multipiication 


Low 

address 
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Matrix-Vector Multiplication 


Example 6-11. Matrix Times a Vector Multiplication 


-k 

* TITLE MATRIX TIMES A VECTOR MULTIPLICATION 

* SUBROUTINE MAT 

* MAT == MATRIX TIMES A VECTOR OPERATION 

* TYPICAL CALLING SEQUENCE:’^ 

* load ARO 

^ load ARl 

* load AR2 

* load AR3 

* load R1 

^ CALL MAT 


-k 

ARGUMENT 

ASSIGNMENTS: 



■k 

ARGUMENT 

1 EUNCTION 



-k 


1 





“T 



■k 

ARO 

1 ADDRESS OE 

M(0, 0) 



ARl 

1 ADDRESS OE 

V(0) 


■k 

AR2 

1 ADDRESS OE 

P (0) 



AR3 

1 NUMBER OE 

ROWS ± 1 

(K±l) 

■k 

R1 

1 NUMBER OE 

COLUMNS ± 

2 {N±2) 

■k 

REGISTERS 

USED AS INPUT 

: ARO, ARl, AR2, AR3, Rl 


REGISTERS 

MODIEIED: RO, 

R2, ARO, 

ARl, AR2, AR3, IRO, 

■k 

-k 

RC, RSA, REA 



-k 

PROGRAM SIZE: 11 



■k 

EXECUTION 

CYCLES: 6 + 10 * K + K 

* (N ± 1) 


.global MAT 



■k 

* 

SETUP 




MAT 

LDI 

Rl,IRO 


; Number of columnsd:2 +> IRO 


AUDI 

2, IRO 


; IRO = N 


* EOR (i = 0; i < K; i++) LOOP OVER THE ROWS 
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Matrix-Vector Multiplication / Vector Maximum Search 


Example 6-11. Matrix Times a Vector Multiplication (Continued) 


ROWS LDF 

o 

o 

r 

Initialize R2 




MPYF3 

*AR0++(1) 

, ’^ARl+T (1) , RO 




■k 



} 

m(i,0) * v(0) ±> 

RO 


■k 

FOR (j = 

1; j < N; 

j++) DO DOT PRODUCT OVER COLUMNS 



■k 

RPTS 

R1 

r 

Multiply a row 

^ a 

column 


MPYF3 

’^AR0 + + (1) 

, *AR1++(1),R0 ; 

m(i,j) * v(j) ±> 

RO 


I I 

-k 

ADDF3 

R0,R2,R2 

r 

m{i,j±l) * v(j±l) 

+ 

R2 ±> R2 

k 

DBD 

AR3 , ROWS 

r 

Counts the no. of 

rows left 


ADDF 

R0,R2 

r 

Last accumulate 




STF 

R2,*AR2++(1) ; 

Result ±> p(i) 




NOP 

*- ARl(IRO) ; 

Set ARl to point 

to 

v(0) 

* 

■k 

!!! DELAYED BRANCH 

HAPPENS HERE !!! 




* 

■k 

RETURN SEQUENCE 






RETS 


} 

Return 



■k 

-k 

end 







. end 







6.5 Vector Maximum Search 

In vector maximum search, a vector of N elements is searched for its greatest 
element: 

max { p ( i ) } 

In pseudo-C format, the search is expressed by: 

max = 0 

max location = 0 
for ( i=0; i < N; itt) } 
if ( max < p [i]} 
max = p [ i]; 
max location = i; 

} 

} 


Example 6-12 shows an example. 
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Vector Maximum Search 


Example 6-12. vecmax.asm 


; Vector maximum search 
; EQUATIONS: max = max {p(i) } 

; TYPICAL CALLING SEQUENCE: 


load ARO 
load RC 
load R1 
CALL vecmax 


ARGUMENT AS SIGNMENT S: 


argument | function 


ARO I address of vector 

RC I length of filter ± 2 (N±2) 

Rl I length of filter - 1 (N-1) 

REGISTERS USED AS INPUT: ARO, Rl, RC 
REGISTERS MODIFIED: RO, Rl, ARO, RC 
REGISTER CONTAINING RESULT: 

RO maximum value 
Rl index of maximum value 

PROGRAM SIZE: 5 words 

EXECUTION CYCLES: 2 + 3N 


vecmax 

. text 

Idf 

’^arO —, rO 


rptb 

loop 


cmpf 3 

*arO,rO 


Idile 

rc, rl 

loop 

Idfle 

*arO—,rO 


end 



last value 

Compare input value to maximum 
Write index of loop 
Load new max value 
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Fast Fourier Transforms (FFTs) 


6.6 Fast Fourier Transforms (FFTs) 

Fourier transforms are an important tool often used in digital signal processing 
(DSP) systems. The purpose of the transform is to convert information from 
the time domain to the frequency domain. The inverse Fourier transform con¬ 
verts information back to the time domain from the frequency domain. Imple¬ 
mentation of Fourier transforms that are computationally efficient are known 
as fast Fourier transforms (FFTs). The theory of FFTs can be found in books 
such as DFT/FFT and Convolution Algorithms, and Digital Signal Processing 
Applications With the TMS320 Family. 

Fast Fourier transform is a label for a collection of algorithms that implement 
efficient conversion from time to frequency domain. Distinctions are made 
among FFTs based on the following characteristics: 

□ Radix-2 or radix-4 algorithms (depending on the size of the FFT butterfly) 

□ Decimation in time or frequency (DIT or DIF) 

□ Complex or real FFTs 

□ FFT length, etc. 

Certain ’C3x features that increase the efficiency of numerically intensive algo¬ 
rithms are particularly well suited for FFTs. The high speed of the device (33-ns 
cycle time) makes implementation of real-time algorithms easier, while float¬ 
ing-point capability eliminates the problems associated with dynamic range. 
The powerful indirect-addressing indexing scheme facilitates the access of 
FFT butterfly legs with different spans. The repeat block implemented by the 
RPTB instruction reduces the looping overhead in algorithms heavily depen¬ 
dent on loops (such as FFTs). This construct provides the efficiency of in-line 
coding in loop form. The FFT reverses the bit order of the output; therefore, 
the output must be reordered. This reordering does not require extra cycles, 
because the device has a special mode of indirect addressing (bit-reversed 
addressing) for accessing the FFT output in the original order. 

The examples in this section are based on programs contained in the DFT/FFT 
and Convolution Algorithms book and in the paper Real-Valued Fast Fourier 
Transform Algorithms. 
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Fast Fourier Transforms (FFTs) 


6.6.1 FFT Definition 

The FFT is an efficient implementation of the discrete fourier transform (DFT) 
equation: 

N-1 

Xn(^) = ^ x(n) 

n = 0 

The inverse DFT equation is: 


Xn(«) = ^ X 

k = 0 

The FFT takes advantage of the periodic nature of the complex exponential 
d^Rto reduce redundancy and number of calculations. The FFT expresses the 
original DFT using two smaller DFTs of length y. This definition is applied until 

the original DFT has been expressed in terms of a 2-point DFT, which is nor¬ 
mally referred to as radix-2 FFT. 

There are two ways this decomposition process occurs: 

□ By decimation in time where the signals are split into several shorter inter¬ 
leaved sequences (see Figure 6-8). 

□ By decimation in frequency where the signals are split into several smaller 
interleaved frequency components (see Figure 6-9). 


Figure 6-8. Decimation in Time for an 8-Point FFT 


stage 3 


Stage 2 


Stage 1 



X{0) 

X{1) 

X{2) 

X{3) 

X{4) 

X{5) 

X{6) 
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Fast Fourier Transforms (FFTs) 


Figure 6-9. Decimation in Frequency for 8-Point FFT 

stage 1 stage 2 Stage 3 

X{0) 



W'^ = 


6.6.2 Complex Radix-2 DIF FFT 

Example 6-13 and Example 6-14 show the implementation of a complex 
radix-2 DIF FFT on the ’C3x. Example 6-13 contains the generic code of the 
FFT, which can be used with a FFT of any length. However, for the complete 
implementation of an FFT, you need a table of twiddle factors (sines/cosines); 
the length of the table depends on the size of the transform. A table with twiddle 
factors (containing 1 -1/4 complete cycles of a sine) is presented separately in 
Example 6-14 as a 64-point FFT. This retains the generic form of the radix-2 
DIF FFT in Example 6-13. A full sine wave must have an equal number of 
samples as the length of the FFT. Example 6-14 uses two variables: N, which 
is the FFT length, and M, which is the logarithm of N to a base equal to the 
radix. In other words, M is the number of stages of the FFT. For example, in 
a 64-point FFT, M = 6 when using a radix-2 algorithm, and M = 3 when using 
a radix-4 algorithm. If the table with the twiddle factors and the FFT code are 
kept in separate files, they will be connected at link time. 
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Fast Fourier Transforms (FFTs) 


Example 6-13. Complex Radix-2 DIF FFT 


* TITLE COMPLEX, RADIX-2 

, DIF FFT 


* GENERIC PROGRAM FOR LOOPED±CODE RADIX±2 FFT COMPUTATION IN TMS320C3x 

* THE 

PROGRAM IS TAKEN FROM THE BURRUS AND PARKS BOOK, P. 111. 

^ THE 

(COMPLEX) DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION 

CO 

H 

-X 

DONE IN PLACE, BUT 

THE RESULT 

IS MOVED TO ANOTHER MEMORY 

* SECTION TO DEMONSTRATE 

■Jr 

THE BIT±REVERSED ADDRESSING. 

* THE 

TWIDDLE FACTORS ARE SUPPLIED 

IN A TABLE THAT IS PUT IN A .DATA 

* SECTION. THIS DATA IS 

INCLUDED IN 

■ A SEPARATE FILE TO PRESERVE THE 

* GENERIC NATURE OF THE 

PROGRAM. FOR THE SAME PURPOSE, THE SIZE OF 

* THE 

FFTN AND LOG2(N) ARE DEFINED 

IN A .GLOBL DIRECTIVE AND SPECIFIED 

* DURING LINKING. 




.globl FFT 

} 

Entry point for execution 


.globl N 

f 

FFT size 


.globl M 

} 

LOG2(N) 


.globl SINE 

} 

Address of sine table 

INP 

.usect "IN",1024 

} 

Memory with input data 

.BSS 

OUTP,1024 

} 

Memory with output data 


. text 



^ INITIALIZE 



FFTSIZ 

.word N 



LOGFFT 

.word M 



SINTAB 

.word SINE 



INPUT 

.word INP 



OUTPUT 

.word OUTP 



FFT: 

LDP FFTSIZ 

} 

Command to load data page pointer 

LDI 

OFFTSIZ,IRl 



LSH 

±2,IRl 

r 

IRl = N/4, pointer for SIN/COS table 

LDI 

0, AR6 

r 

AR6 holds the current stage number 

LDI 

OFFTSIZ,IRO 



LSH 

1, IRO 

} 

IRO = 2*N1 (because of real/imag) 

LDI 

@FFTSIZ,R7 

f 

II 

LDI 

1, AR7 

} 

Initialize repeat counter 



f 

of first loop 

LDI 

1, AR5 

} 

Initialize IE index (AR5 = IE) 
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Fast Fourier Transforms (FFTs) 


Example 6-13. Complex Radlx-2 DIF FFT (Continued) 

* OUTER LOOP 

LOOP: NOP 

’^++AR6(1) ; 

Current FFT stage 

LDI 

@INPUT,AR0 ; 

ARO points to X(I) 

ADD I 

R7,AR0,AR2 ; 

AR2 points to X(L) 

LDI 

AR7,RC 


SUBI 

1,RC ; 

RC should be one less than desired # 

* FIRST LOOP 

RPTB 

BLKl 


ADDF 

*AR0,*AR2,R0 ; 

RO = X(I)+X(L) 

SUBF 

’^AR2++, *AR0++, R1 ; 

Rl = X(I)±X(L) 

ADDF 

*AR2,*AR0,R2 ; 

R2 = Y (I) +Y (L) 

SUBF 

*AR2,*AR0,R3 ; 

R3 = Y(I)±Y(L) 

STF 

R2,*AR0-- ; 

Y(I) = R2 and... 

I I STF 

R3,*AR2-- ; 

Y(L) = R3 

BLKl STF 

RO,*AR0++(IRQ) ; 

X(I) = RO and... 

I I STF 

Rl,*AR2++(IRO) ; 

X(L) = Rl and ARO,2 = ARO,2 + 2*n 

* IF THIS IS THE LAST STAGE, YOU ARE DONE 

CMP I 

@LOGFFT,AR6 


BZD 

END 


MAIN INNER 

LOOP 


LDI 

2, ARl 

; Init loop counter for 

LDI 

@SINTAB,AR4 

; inner loop 

; Initialize lA index (AR4 = lA) 

INLOP: ADDI 

AR5,AR4 

; lA = lA+IE; AR4 points to 

LDI 

ARl,ARO 

; cosine 

ADDI 

2, ARl 

; Increment inner loop counter 

ADDI 

@INPUT,ARO 

; (X(I),Y(I)) pointer 

ADDI 

R7,ARO,AR2 

; (X(L),Y(L)) pointer 

LDI 

AR7,RC 


SUBI 

1,RC 

; RC should be 1 less than 

LDF 

*AR4,R6 

; desired # 

; R6 = SIN 

* SECOND LOOP 

RPTB 

BLK2 


SUBF 

*AR2,*AR0,R2 

; R2 = X(I)±X(L) 

SUBF 

*+AR2,*+AR0, R1 


■k 


; Rl = Y(I)±Y(L) 

MPYF 

o 

CM 

PC 

; RO = R2*SIN and... 

I I ADDF 

*+AR2,*+AR0,R3 




; R3 = Y(I)+Y(L) 

MPYF 

Rl,*+AR4(IRl),R3 

; R3 = Rl *COS and . . . 

I I STF 

R3,*+AR0 

; Y(I) = Y(I)+Y(L) 

SUBF 

R0,R3,R4 

; R4 = Rl * COS±R2 * SIN 

MPYF 

R1,R6,R0 

; RO = Rl * SIN and... 
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Example 6-13. Complex Radlx-2 DIF FFT (Continued) 


11 

ADDE 

’^AR2, 

AR0,R3 ; 

R3 = 

X(I) + X(L) 




MR YE 

R2,*+AR4(IRl) ,R3 ; 

R3 = 

R2 * COS and... 



11 

STF 

R3,*AR0++(IRO) 





■k 



r 

X(I) 

= X(I)+X(L) and ARO = 

AR0+2*N1 


ADDE 

R0,R3, 

R5 ; 

R5 = 

R2*COS+Rl*SIN 



BLK2 

STF R5, 

*AR2++(IR0) ; 

X(L) 

= R2 * COS+Rl SIN, 






} 

incr AR2 and... 



I I 

STF R4, 

*+AR2 

} 

Y(L) 

= Rl^COSiRP’^SIN 




CMP I 

R7,ARl 







BNE 

INLOP 

f 

Loop 

back to the inner loop 



LSH 

1, AR7 

f 

Increment loop counter for 

next 

time 


BRD 

LOOP 

} 

Next 

FFT stage (delayed) 




LSH 

1, AR5 

r 

IE = 

2*IE 




LDI 

R7,IRO 

} 

N1 = 

N2 




LSH 

±1,R7 

r 

N2 = 

N2/2 



* STORE RESULT OUT USING BIT-REVERSED ADDRESSING 




END: 

LDI 

@FFTSIZ,RC 

f 

RC = N 





SUBI 

1,RC 

} 

RC should be one less than desired # 



LDI 

OFFTSIZ,IRO 

f 

IRO = size of FFT = 

N 




LDI 

2, IRl 







LDI 

OINPUT,ARO 







LDI 

OOUTPUT,ARl 







RPTB 

BITRV 







LDF 

*+AR0(1) , RO 






I I 

LDF 

*AR0++(IRO) B, 

R1 





BITRV 

STF 

RO,*+ARl(1) 






I I 

STF 

Rl,*AR1++(IRl) 





SELF 

BR 

SELF 

f 

Branch to itself at 

the 

end 



. end 
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Example 6-14. Table With Twiddle Factors fora 64-Point FFT 

*TITLE TABLE WITH TWIDDLE EACTORS FOR A 64±POINT FFT 

■Jr 

* FILE TO BE LINKED 

WITH THE SOURCE CODE FOR A 64-POINT, RADIX±2 FFT * 

.globi SINE 


.globi N 


.globi M 


N .set 64 


M .set 6 


. data 


SINE 


.float 

0.000000 

.float 

0.098017 

.float 

0.195090 

. float 

0.290285 

.float 

0.382683 

.float 

0.471397 

.float 

0.555570 

.float 

0.634393 

.float 

0.707107 

.float 

0.773010 

.float 

0.831470 

.float 

0.881921 

.float 

0.923880 

.float 

0 . 956940 

. float 

0.980785 

. float 

0 . 995185 

COSINE 


. float 

1.000000 

. float 

0 . 995185 

. float 

0.980785 

. float 

0 . 956940 

. float 

0.923880 

. float 

0.881921 

. float 

0.831470 

. float 

0.773010 

. float 

0.707107 

. float 

0.634393 

. float 

0.555570 

. float 

0.471397 

. float 

0.382683 

. float 

0.290285 

. float 

0.195090 

. float 

0.098017 
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Example 6-14. Table With Twiddle Factors fora 64-Point FFT (Continued) 

. float 

0.000000 

.float ± 

0.098017 

.float ± 

0.195090 

.float ± 

0.290285 

.float ± 

0.382683 

.float — 

0.471397 

.float 

-0.555570 

.float — 

0.634393 

.float — 

0.707107 

.float — 

0.773010 

.float — 

0.831470 

.float — 

0.881921 

.float — 

0.923880 

.float — 

0.956940 

.float — 

0.980785 

.float — 

0.995185 

.float 

-1.000000 

.float — 

0.995185 

.float — 

0.980785 

.float — 

0.956940 

.float — 

0.923880 

.float — 

0.881921 

.float — 

0.831470 

.float — 

0.773010 

.float — 

0.707107 

.float — 

0.634393 

.float — 

0.555570 

.float — 

0.471397 

.float — 

0.382683 

.float — 

0.290285 

.float — 

0.195090 

.float — 

0.098017 

.float 

0.000000 

.float 

0.098017 

.float 

0.195090 

.float 

0.290285 

.float 

0.382683 

.float 

0.471397 

.float 

0.555570 

.float 

0.634393 

.float 

0.707107 

.float 

0.773010 

.float 

0.831470 

.float 

0.881921 

.float 

0.923880 

.float 

0.956940 

.float 

0.980785 

.float 

0.995185 


DSP Algorithms 


6-35 













Fast Fourier Transforms (FFTs) 


6.6.3 Complex Radix-4 DIF FFT 

The radix-2 algorithm has tutorial value because the functioning of the FFT 
algorithm is relatively easy to understand. However, radix-4 implementation 
can increase execution speed by reducing the amount of arithmetic required. 
Example 6-15 shows the generic implementation of a complex DIF FFT in 
radix-4. A companion table, such as the one in Example 6-14, must have a 
value of M equal to the logN, where the base of the logarithm is 4. 


Example 6-15. Complex Radix-4 DIF FFT 


* TITLE COMPLEX, RADIX-4, DIE EFT 

* GENERIC PROGRAM TO PERFORM A LOOPED±CODE RADIX±4 FFT COMPUTATION 

* IN THE TMS320C3X 

* THE PROGRAM IS TAKEN FROM THE BURRUS AND PARKS BOOK, P. 117. 

* THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY, AND THE COMPUTATION 

* IS DONE IN PLACE. 

* THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE THAT IS PUT IN A .DATA 

* SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE THE 
GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE SIZE OF 

* THE FFT N AND LOG4(N) ARE DEFINED IN A .GLOBL DIRECTIVE AND 

* SPECIFIED DURING LINKING. 

* 

* IN ORDER TO HAVE THE FINAL RESULT IN BIT±REVERSED ORDER, THE TWO 

* MIDDLE BRANCHES OF THE RADIX±4 BUTTERFLY ARE INTERCHANGED DURING 

* STORAGE. NOTE THIS DIFFERENCE WHEN COMPARING WITH THE PROGRAM IN 


* p. 

■k 

117 OF THE 

BURRUS AND 

PARKS BOOK. 

* 

.globl 

FFT 

r 

Entry point for execution 


.globl 

N 

r 

FFT size 


.globl 

M 

r 

LOG4(N) 


.globl 

SINE 

r 

Address of sine table 


.usect 

"IN",1024 

r 

Memory with input data 


. text 




* INITIALIZE 




TEMP 

. word 

$ + 2 



STORE 

. word 

FFTSIZ 

r 

Beginning of temp storage 


. word 

N 




. word 

M 




. word 

SINE 




. word 

INP 
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Example 6-15. Complex Radlx-4 DIF FFT (Continued) 


BSS 

FFTSIZ,! 

EFT size 

BSS 

LOGFFT,! 

LOG4 (FFTSIZ) 

BSS 

SINTAB,1 

Sine/cosine table base 

BSS 

INPUT,1 

Area with input data to process 

BSS 

STAGE,1 

FFT stage # 

BSS 

RPTCNT,! 

Repeat counter 

BSS 

lEINDX,! 

IE index for sine/cosine 

BSS 

LPCNT,1 

Second+ioop count 

BSS 

JT,1 

JT counter in program, P. 117 

BSS 

IA1,1 

lAl index in program, P. 117 


EET: 



* INITIALIZE DATA LOCATIONS 


LDP 

TEMP 

} 

Command to load data page counter 

LDI 

@TEMP,ARO 



LDI 

@STORE,ARl 



LDI 

*AR0++,R0 

r 

Xfer data from one memory to the other 

STI 

RO,*AR1++ 



LDI 

*AR0++,R0 



STI 

RO, ’^AR1++ 



LDI 

*AR0++,R0 



STI 

RO,*AR1++ 



LDI 

*AR0,R0 



STI 

RO, ’^ARl 



LDP 

EFTSIZ 

f 

Command to load data page pointer 

LDI 

OFFTSIZ,RO 



LDI 

OFFTSIZ,IRO 



LDI 

OFFTSIZ,IRl 



LDI 

0, AR7 



STI 

AR7,@STAGE 

} 

@STAGE holds the current stage number 

LSH 

1, IRO 

f 

IRO = 2*N1 (because of real/imag) 

LSH 

±2,IRl 

} 

IRl = N/4, pointer for SIN/COS table 

LDI 

1, AR7 



STI 

AR7,ORPTCNT 

} 

Init repeat counter of first loop 

STI 

AR7,OIEINDX 

r 

Init. IE index 

LSH 

±2,R0 

} 

JT = RO/2+2 

ADD I 

2,R0 



STI 

RO,@JT 



SUBI 

2,R0 



LSH 

1,R0 

f 

RO = N2 

* OUTER 

LOOP 



LOOP : 




LDI 

OINPUT,ARO 

} 

ARO points to X(I) 

ADD I 

RO,ARO,ARl 

r 

ARl points to X(I1) 

ADD I 

RO,ARl,AR2 

} 

AR2 points to X(I2) 

ADD I 

RO,AR2,AR3 

f 

AR3 points to X(I3) 

LDI 

ORPTCNT,RC 



SUBI 

1,RC 

r 

RC should be one less than desired # 
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Example 6-15. Complex Radlx-4 DIF FFT (Continued) 



FIRST LOOP 




RPTB BLKl 




ADDF 

+AR0,*+AR2, R1 



■k 

ADDF 

’^+AR3, *+ARl, R3 

R1 = Y(I)+Y (12) 


■k 



R3 = Y(11)+Y(13) 



ADDF 

R3, Rl,R6 

R6 = R1+R3 



SUBF 

*+AR2,*+AR0,R4 



k: 



R4 = Y (I) ±Y (12) 



STF 

R6,*+AR0 

Y (I) = R1+R3 



SUBF 

R3, R1 

R1 = R1±R3 



LDF 

*AR2,R5 

R5 = X(I2) 


I I 

LDF 

*+ARl,R7 

R7 = Y (11) 



ADDF 

’^AR3, *AR1, R3 

R3 = X(11)+X(13) 



ADDF 

R5,*AR0,R1 

R1 = X (I) +X (12) 


I I 

STF 

Rl,*+ARl 

Y (11) = R1±R3 



ADDF 

R3,R1,R6 

R6 = R1+R3 



SUBF 

R5,*AR0,R2 

R2 = X(I)±X(I2) 


I I 

STF 

R6,*AR0++(IRO) 

X(I) = R1+R3 



SUBF 

R3,R1 

R1 = R1±R3 



SUBF 

*AR3,*AR1,R6 

R6 = X(I1)±X(I3) 



SUBF 

R7,*+AR3,R3 

±R3 = Y(I1)±Y(I3) 


I I 

STF 

Rl,*AR1++(IRO) 

X(I1) = R1±R3 



SUBF 

R6,R4,R5 

R5 = R4±R6 



ADDF 

R6,R4 

R4 = R4+R6 



STF 

R5,*+AR2 

Y(12) = R4±R6 


I I 

STF 

R4,*+AR3 

Y (13) = R4+R6 



SUBF 

R3,R2,R5 

R5 = R2±R3 



ADDF 

R3,R2 

R2 = R2+R3 


BLKl STF 

R5,*AR2++(IRO) 

X(I2) = R2±R3 


I I 

STF R2, *AR3++ (IRO) 

X(I3) = R2+R3 


■k 

IF THIS 

IS THE LAST STAGE, 

YOU ARE DONE 



LDI 

@STAGE,AR7 




ADD I 

1, AR7 




CMP I 

OLOGEET,AR7 




BZD 

END 




STI 

AR7,@STAGE 

; Current EET stage 


■k 

MAIN INNER LOOP 




LDI 

1, AR7 




STI 

AR7,@IA1 

; Init lAl index 



LDI 

2, AR7 




STI 

AR7,@LPCNT 

; Init loop counter for 

inner loop 




; INLOP: 



LDI 

2,AR6 

; Increment inner loop 

counter 


ADD I 

OLPCNT,AR6 




LDI 

OLPCNT,ARO 




LDI 

@IA1,AR7 
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Example 6-15. Complex Radlx-4 DIF FFT (Continued) 



ADD I 

@IEINDX,AR7 

; lAl = lAl+IE 


ADD I 

@INPUT,ARO 

; (X(I),Y(I)) pointer 


STI 

AR7,@IA1 



ADD I 

RO,ARO,ARl 

; (X(11),Y(11)) pointer 


STI 

AR6,@LPCNT 



ADD I 

RO,ARl,AR2 

; (X(12),Y(12)) pointer 


ADD I 

RO,AR2,AR3 

; (X(13),Y(13)) pointer 


LDI 

@RPTCNT,RC 



SUBI 

1, RC 

; RC should be one less than desired # 


CMP I 

@JT,AR6 

; If LPCNT = JT, go to 


BZD 

SPCL 

; special butterfly 


LDI 

@IA1,AR7 



LDI 

@IA1,AR4 



ADD I 

@SINTAB,AR4 

; Create cosine index AR4 


SUBI 

1, AR4 

; Adjust sine table pointer 


ADD I 

AR4,AR7,AR5 



SUBI 

1, AR5 

; IA2 = IA1+IA1±1 


ADD I 

AR7,AR5,AR6 



SUBI 

1, AR6 

; IA3 = IA2+IA1±1 


SECOND LOOP 



RPTB 

BLK2 



ADDF 

*+AR2,*+AR0,R3 


•k 



; R3 = Y (I) +Y (12) 


ADDF 

*+AR3,*+ARl,R5 


■k 



; R5 = Y(11)+Y (13) 


ADDF 

R5,R3,R6 

; R6 = R3+R5 


SUBF 

*+AR2,*+AR0,R4 


■k 



; R4 = Y(I)±Y(I2) 


SUBF 

R5,R3 

; R3 = R3±R5 


ADDF 

*AR2,*AR0,R1 

; R1 = X (I) +X (12) 


ADDF 

^AR3,*AR1,R5 

; R5 = X(11)+X(13) 


MPYF 

R3,*+AR5(IRl),R6 

R6 = R3*C02 

I I 

STF 

R6,*+AR0 

; Y(I) = R3+R5 


ADDF 

R5,R1,R7 

; R7 = R1+R5 


SUBF 

*AR2,*AR0,R2 

; R2 = X(I)±X(I2) 


SUBF 

R5,R1 

; R1 = R1±R5 


MPYF 

Rl,*AR5,R7 

; R7 = R1*SI2 

I I 

STFR7, 

*AR0++(IRO) 

; X(I) = R1+R5 


SUBF 

R7, R6 

; R6 = R3*C02±R1*SI2 


SUBF 

*+AR3,*+ARl,R5 


■k 



; R5 = Y(I1)±Y(I3) 


MPYF 

Rl,*+AR5(IRl) , R7 

; R7 = R1*C02 

I I 

STF 

R6,*+ARl 

; Y(I1) = R3*C02±R1*SI2 


MPYF 

R3,*AR5,R6 

; R6 = R3*SI2 


ADDF 

R7,R6 

; R6 = R1*C02+R3*SI2 


ADDF 

R5, R2,R1 

; R1 = R2+R5 


SUBF 

R5,R2 

; R2 = R2±R5 


SUBF 

*AR3,*AR1,R5 

; R5 = X(I1)±X(I3) 


SUBF 

R5,R4,R3 

; R3 = R4±R5 


ADDF 

R5, R4 

; R4 = R4+R5 


MPYF 

R3,*+AR4(IRl),R6 

; R6 = R3*C01 
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Fast Fourier Transforms (FFTs) 


Example 6-15. Complex Radlx-4 DIF FFT (Continued) 

I I 

STF R6, 

*AR1++(IR0) ; 

X(I1) = R1*C02+R3*SI2 


MPYF 

R1,*AR4,R7 ; 

R7 = Rl^SIl 


SUBF 

R7,R6 ; 

R6 = R3*C01±R1*SI1 


MPYF 

Rl,*+AR4(IRl) ,R6 ; 

R6 = Rl^COl 

I I 

STFR6, 

*+AR2 ; 

Y(I2) = R3*C01±R1*SI1 


MPYF 

R3,*AR4,R7 ; 

R7 = R3*SI1 


ADDF 

R7,R6 ; 

R6 = R1*C 01+R3*SI1 


MPYF 

R4,*+AR6(IRl),R6 ; 

R6 = R4*C03 

I I 

STFR6, 

*AR2++(IR0) ; 

X(I2) = R1^C01+R3*SI1 


MPYF 

R2,*AR6,R7 ; 

R7 = R2*SI3 


SUBF 

R7,R6 ; 

R6 = R4*C03±R2*SI3 


MPYF 

R2,^+AR6(IRl),R6 ; 

R6 = R2*C03 

I I 

STFR6, 

*+AR3 ; 

Y(I3) = R4*C03±R2*SI3 


MPYF 

R4,*AR6,R7 ; 

R7 = R4*SI3 


ADDF 

R7,R6 ; 

R6 = R2*C03+R4*SI3 

BLK2 

STF 

R6,*AR3++(IRO) 


■k 


r 

x(i3) = R2*C03+R4*SI3 


CMP I 

@LPCNT,RO 



BP INLOP ; 

Loop back to the inner loop 


BR CONT 


* SPECIAL BUTTERFLY FOR W = J 


SPCL 

LDI IRl 

, AR4 



LSH±1, 

AR4 ; 

Point to SIN(45) 


ADD I 

@ SINTAB,AR4 ; 

Create cosine index AR4 = C021 


RPTB 

BLK3 



ADDF 

’^AR2, *AR0, R1 ; 

R1 = X (I) +X (12) 


SUBF 

*AR2,*AR0,R2 ; 

R2 = X(I)±X(I2) 


ADDF 

*+AR2,*+AR0,R3 


■k 


f 

R3 = Y (I) +Y (12) 


SUBF 

*+AR2,*+AR0,R4 


* 


} 

R4 = Y (I) ±Y (12) 


ADDF 

*AR3,*AR1,R5 ; 

R5 = X(11)+X(13) 


SUBF 

R1,R5,R6 ; 

R6 = R5±R1 


ADDF 

R5,R1 ; 

R1 = R1+R5 


ADDF 

*+AR3,*+ARl,R5 


■k 


r 

R5 = Y(11)+Y (13) 


SUBF 

R5,R3,R7 ; 

R7 = R3±R5 


ADDF 

R5,R3 ; 

R3 = R3+R5 


STF 

R3,*+AR0 ; 

Y(I) = R3+R5 

I I 

STF 

Rl,*AR0++(IRO) ; 

X(I) = R1+R5 


SUBF 

*AR3,*AR1,R1 ; 

R1 = X(I1)±X(I3) 


SUBF 

*+AR3,*+ARl,R3 


* 


} 

R3 = Y(I1)±Y(I3) 


STF 

R6,*+AR1 ; 

Y (11) = R5±R1 
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Example 6-15. Complex Radlx-4 DIF FFT (Continued) 


11 

STE 

R7,*AR1++(IRQ) 

; X(I1) = R3±R5 


ADDF 

R3, R2,R5 

; R5 = R2+R3 


SUBF 

R2,R3, R2 

; R2 = ±R2+R3 


SUBF 

Rl, R4,R3 

; R3 = R4±R1 


ADDF 

Rl, R4 

; R4 = R4+R1 


SUBF 

R5, R3,R1 

; R1 = R3±R5 


MR YE 

*AR4,R1 

; R1 = R1*C021 


ADDF 

R5,R3 

; R3 = R3+R5 


MR YE 

*AR4,R3 

; R3 = R3*C021 

11 

STE 

Rl,*+AR2 

; Y(I2) = (R3±R5)*C021 


SUBF 

R4,R2,R1 

; R1 = R2±R4 


MR YE 

*AR4,R1 

; R1 = R1*C021 

11 

STE 

R3,*AR2++(IRQ) 

; X(I2) = (R3+R5)*C021 


ADDF 

R4,R2 

; R2 = R2+R4 


MR YE 

*AR4,R2 

; R2 = R2*C021 

BLK3 

STE 

Rl,*+AR3 

; Y (13) = ±(R4±R2)*C021 

I I 

STE R2, 

’^AR3++ (IRQ) 

; X(I3) = (R4+R2)*C021 


CMRI 

@LRCNT,R0 



BRD 

INLOR 

; Loop back to the inner loop 

CONT 

LDI 

@RRTCNT,AR7 



LDI 

@IEINDX,AR6 



LSH 

2, AR7 

; Increment repeat counter for 

•k 

STI 

AR7,@RRTCNT 

; next time 


LSH 

2, AR6 

; IE = 4*IE 


STI 

AR6,@IEINDX 



LDI 

RO,IRQ 

; N1 = N2 


LSH 

-3,R0 



ADD I 

2,R0 



STI 

RO,@JT 

; JT = N2/2+2 


SUBI 

2,R0 



LSH 

1,R0 

; N2 = N2/4 


BR 

LOOR 

; Next FFT stage 

* STORE RESULT USING BIT±REVERSED ADDRESSING 

END: 

LDI 

@FFTSIZ,RC 

; RC = N 


SUBI 

1,RC 

; RC should be one less than desired # 


LDI 

@FFTSIZ,IRQ 

; IRQ = size of FFT = N 


LDI 

2, IRl 



LDI 

@INRUT,ARO 



LDR 

STORE 



LDI 

@STORE,ARl 



RRTB 

BITRV 



LDF 

*+AR0(1) , RO 


I I 

LDF 

*AR0++(IR0)B,R1 


BITRV 

STE 

RO,*+ARl(1) 


I I 

STE 

Rl,*AR1++(IRl) 


SELE 

BR SELF 

; Branch to itself at the end 


. end 
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6.6.4 Real Radix-2 FFT 

In many cases, the data to be transformed is usually a sequence of real num¬ 
bers. This real input data has properties that reduce the computational load of 
the FFT algorithm even further. The FFT algorithm that exploits such properties 
is called a real radix-2 FFT. Example 6-16 shows the generic implementation 
of a real-valued, forward radix-2 FFT. For such an FFT, the total storage required 
for a length-N transform is only N locations; in a complex FFT, 2N locations are 
necessary. Recovery of the rest of the points is based on the symmetry condi¬ 
tions. 


Example 6-16. Real Forward Radlx-2 FFT 


FILENAME : ffft_rl.asm 


WRITTEN BY 

DATE 

VERSION 


Alex Tessarolo 

Texas Instruments, Australia 

23rd July 1991 

2.0 




VER 

DATE 



COMMENTS 


1.0 

18th 

July 

91 

Original release. 


2.0 

23rd 

July 

91 

Most stages modified. 

Minimum FET size increased from 32 to 64. 
Easter in place bit reversing algorithm. 
Program size increased by about 100 words. 
One extra data word required. 


i^i^i^i^i^i^i^i^i^i^i^'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


SYNOPSIS: int 


ffft_rl( EFT_SIZE, LOG_SIZE, SOURCE_ADDR, DEST_ADDR, 
SINE_TABLE, BIT_REVERSE ); 


int 

int 

float 

float 

float 

int 


NOTE : 


EET_SIZE 

LOG_SIZE 

*SOURCE_ADDR 

*DEST_ADDR 

*SINE_TABLE 
BIT REVERSE 


64, 128, 256, 512, 1024, ... 

6, 7, 8, 9, 10, ... 

Points to location of source data. 
Points to where data will be 
operated on and stored. 

Points to the SIN/COS table. 

= 0, bit reversing is disabled. 

<> 0, input bit is provided, reversed 
is enabled. 


1) If SOURCE_ADDR = DEST_ADDR, then in-place bit 
reversing is performed, if enabled (more 
processor intensive). 

2) EFT_SIZE must be >= 64 (this is not checked). 
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Example 6- 

16. Real Forward Radix-2 FFT (Continued) 

* DESCRIPTION: Generic 

function to do a radix-2 FFT computation on the 030. 


The data array is FFT 

_SIZE-Iong with only real data. The out- 

•k 

put is 

stored in the 

same locations with real and imaginary 

7k- 

points 

R and I as follows: 

7t 

DEST_ADDR[0] 

-► R(0) 

7k 



R{1) 

7k 



R(2) 

7k 

7k 



R{3) 

7k 

7k 



R(FFT_SIZE/2) 

7k 

7k 



I(FFT_SIZE/2 - 1) 

7k 

7k 



I (2) 

7k 

7k 

DEST_ADDR[FFT_SIZE - 

1] 1(1) 

7k 

The program is based ^ 

on the FORTRAN program in the 

7k 

paper by Sorensen et . 

al., June 1987 issue of Trans. 

7k 

7k 

on ASSP 



7k 

Bit reversal is optionally implemented at the begin- 

7k 

ning of 

the function. 


7k 

If bit 

reversal is selected (bit reverse 0) , the data 

7k 

input is expected in bit-reverse order 

7k 

The sine/cosine table 

for the twiddle factors is ex¬ 

7k 

7k 

pected 

to be supplied 

in the following format: 

7k 

SINE_TABLE[0] s 

> sin(0^2*pi/FFT_SIZE) 

7k 

7k 



sin(l*2*pi/FFT_SIZE) 

7k 

7k 



sin ( (FFT_SIZE/2-2) ’^2*pi/FFT_SIZE) 

7k 

7k 

SINE_TABLE[FFT_SIZE/2 

- 1] > sin ( (FFT_SIZE/2-l)*2*pi/FFT_SIZE) 

7k 

7k 

NOTE: The table is the first half period of a sine wave. 

7k 

7k 

7k 

Stack structure upon 

call: 

7k 

-FP (7) 

BIT_REVERSE 


7k 

-FP (6) 

SINE_TABLE 


7k 

-FP (5) 

DEST_ADDR 


7k 

-FP (4) 

SOURCE_ADDR 


7k 

-FP (3) 

LOG_SIZE 


7k 

-FP (2) 

FFT_SIZE 


7k 

-FP (1) 

returne 


7k 

-FP (0) 

addr 


7k 


old FP 


7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k7k 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 

■k 

NOTE: 

Calling C program can be compiled using either large 

-k 


or small model. 


WARNING 

: DP initialized only once in the program. Be wary 

■k 


with interrupt service routines. Make sure interrupt 

* 

■k 


service routines save the DP pointer. 

k: 

WARNING 

: The DEST_ADDR must be aligned such that the first 

k: 


LOG_SIZE bits are zero (this is not checked by the 

■k 

* 


program). 

-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k 

* REGISTERS 

USED: RO, 

Rl, R2, R3, R4, R5, R6, R7 


ARO, 

ARl, AR2, AR3, AR4, AR5, AR6, AR7 

■k 

IRO, 

IRl 

* 

RC, 

RS, RE 

•k 

k: 

DP 


* MEMORY REQUIREMENTS: 

Program = 405 Words (approximately) 

kc 


Data = 7 Words 

■k 

-Ir 


Stack = 12 Words 

k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:kk:k:k:k:k:k:k:kk:kk:kk:k:k:kk:k:k:kk:k:k:kkkk: 

■k 

* BENCHMARKS 

: Assumptions - Program in RAMO 

■k 


- Reserved data in RAMO 

* 


- Stack on primary/expansion bus RAM 

■k 


- Sine/cosine tables in RAMO 

k: 


- Processing and data destination in RAMI. 

■k 

■k 


- Primary/expansion bus RAM, 0 wait state. 

■k 

-k 

EET Size Bit Reversing Data Source Cycles (C30) 

■k 

1024 

OEE RAMI 19816 approx. 

* 

Note: This number does not include the C callable overheads. 

■k 

■k 

Add 57 cycles for these overheads. 

■kk-kk-kk'kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk'kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk'k 

EP 

. set 

AR3 


.global 

_ffft_rl ; Entry execution point. 

EET_SIZE: 

.usect 

".fftdata",! ; Reserve memory for arguments. 

LOG_SIZE: 

.usect 

".fftdata",1 

SOURCE_ADDR: 

.usect 

".fftdata",1 

DEST_ADDR: 

.usect 

".fftdata",1 

SINE_TABLE: 

.usect 

".fftdata",1 

BIT_REVERSE: 

.usect 

".fftdata", 1 

SEPARATION: 

.usect 

".fftdata", 1 
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Example 6-16. Real Forward Radix-2 FFT (Continued) 


} 

r 

Initialize C function. 


. sect 

" .ffttext" 



PUSH 

FP 

Preserve C environment. 


LDI 

SP,FP 



PUSH 

R4 



PUSH 

R5 



PUSH 

R6 



PUSHF 

R6 



PUSH 

R7 



PUSHF 

R7 



PUSH 

AR4 



PUSH 

AR5 



PUSH 

AR6 



PUSH 

AR7 



PUSH 

DP 



LDP 

FFT_SIZE ; 

Init. DP pointer. 


LDI 

’^-FP(2),R0 

Move arguments from stack. 


STI 

RO,@FFT_SIZE 



LDI 

*-FP (3) ,RO 



STI 

RO,@LOG_SIZE 



LDI 

*-FP (4) ,RO 



STI 

RO,@SOURCE_ADDR 



LDI 

*-FP (5) ,RO 



STI 

RO,@DEST_ADDR 



LDI 

*-FP(6),RO 



STI 

RO,@SINE_TABLE 



LDI 

*-FP(7),RO 



STI 

RO,@BIT_REVERSE 




r 

} 

Check bit reversing mode (on or 

off) . 


r 

} 

BIT_REVERSING = 0, then OFF 



r 

(no bit reversing). 



r 

BIT_REVERSING <> 0, Then ON. 


LDI 

@BIT_REVERSE, RO 



CMP I 

0,R0 



BZ 

MOVE_DATA 




r 

r 

Check bit reversing type. 



r 

r 

If SourceAddr = DestAddr, then in 

place 


} 

bit reversing. 



r 

If SourceAddr <> DestAddr, then 



r 

r 

standard bit reversing. 
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Example 6-16. Real Forward Radix-2 FFT (Continued) 



LDI 

@SOURCE_ADDR,RO 




CMP I 

@DEST_ADDR,RO 




BEQ 

IN_PLACE 






r 

r 

Bit reversing Type 1 (from source to 




} 

destination). 




r 

} 

NOTE: abs(SOURCE_ADDR - DEST_ADDR) 




r 

must be > FFT_SIZE, this is not 




} 

checked. 


LDI 

@FET_SIZE,RO 




SUBI 

2,R0 




LDI 

@FFT_SIZE,IRO 




LSH 

-1,IRO 

r 

IRO = half FFT size. 


LDI 

@ SOURCE_ADDR,ARO 




LDI 

@DEST_ADDR,ARl 




LDE 

*AR0++,R1 




RPTS 

RO 




LDE 

*AR0++, R1 



11 

STE 

Rl,*AR1++(IRO)B 




STE 

Rl,*AR1++(IRO)B 




BR 

START 






} 

r 

f 

In-place bit reversing. 




r 

Bit reversing on even locations. 




} 

1st half only. 

IN_PLACE: 

LDI 

@FFT_SIZE,IRO 




LSH 

-2,IRO 

r 

IRO = quarter FFT size. 


LDI 

2, IRl 




LDI 

@FFT_SIZE,RC 




LSH 

-2,RC 




SUBI 

3,RC 




LDI 

@DEST_ADDR,ARO 




LDI 

ARO,ARl 




LDI 

ARO,AR2 




NOP 

*AR1++(IRO)B 




NOP 

*AR2++(IRO)B 




LDE 

*++ARO(IRl),RO 




LDE 

*AR1,R1 




CMP I 

ARl,ARO 

r 

Xchange loos only if AR0<AR1. 


LDEGT 

RO, R1 




LDEGT 

*AR1++(IR0)B,R1 
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Example 6-16. Real Forward Radix-2 FFT (Continued) 



RPTB 

BITRVl 



LDF 

*++AR0(IRl),RO 


11 

STF 

RO,*AR0 



LDF 

*AR1,R1 


11 

STF 

Rl,*AR2++(IRO)B 



CMP I 

ARl,ARO 



LDFGT 

R0,R1 


BITRVl: 

LDFGT 

*AR1++(IRO)B, RO 



STF 

RO,*AR0 



STF 

Rl, ’^AR2 

r 

Perform bit reversing on odd 



} 

locations, 2nd half only. 


LDI 

@FFT_SIZE,RC 



LSH 

-1,RC 



LDI 

@DEST_ADDR,ARO 



ADD I 

RC,ARO 



ADD I 

1, ARO 



LDI 

ARO,ARl 



LDI 

ARO,AR2 



LSH 

-1,RC 



SUBI 

3,RC 



NOP 

*AR1++(IRO)B 



NOP 

*AR2++(IRO)B 



LDF 

*++AR0(IRl) ,R0 



LDF 

*AR1,Rl 



CMP I 

ARl,ARO ; 

Xchange Iocs only if AR0<AR1. 


LDFGT 

RO, Rl 



LDFGT 

*AR1++(IR0)B,R1 



RPTB 

BITRV2 



LDF 

*++AR0(IRl) ,R0 


I I 

STF 

RO,*AR0 



LDF 

*AR1,R1 


I I 

STF 

Rl,*AR2++(IRO)B 



CMP I 

ARl,ARO 



LDFGT 

RO, Rl 


BITRV2: 

LDFGT 

*AR1++(IR0)B,R0 



STF 

RO,*AR0 



STF 

Rl,*AR2 



; Perform 

bit reversing on odd 



; locations, 1st half only. 



DSP Algorithms 


6-47 













Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


LDI 

@FFT_SIZE,RC 


LSH 

-1,RC 


LDI 

RC,IRO 


LDI 

@DEST_ADDR,ARO 


LDI 

ARO,ARl 


ADD I 

1, ARO 


ADD I 

IRO,ARl 


LSH 

-1,RC 


LDI 

RC,IRO 


SUBI 

2,RC 


LDF 

*AR0,R0 


LDF 

*AR1,R1 


RPTB 

BITRV3 


LDF 

*++AR0(IRl) , RO 

I I 

STF 

RO,*AR1++(IRO)B 

BITRV3: 

LDF 

*AR1,R1 

I I 

STF 

Rl,*-AR0(IRl) 


STF 

RO,*AR1 


STF 

Rl,*AR0 


BR 

START 

r 

} 

Check 

data source locations. 

r 

} 

If SourceAddr = DestAddr, then 

r 

do nothing. 

} 

If SourceAddr <> DestAddr, then move 


data. 


MOVE_DATA: 

LDI 

@SOURCE_ADDR,RO 


CMP I 

@DEST_ADDR,RO 


BEQ 

START 


LDI 

@FFT_SIZE,RO 


SUBI 

2,R0 


LDI 

@ SOURCE_ADDR,ARO 


LDI 

@DEST_ADDR,ARl 


LDF 

*AR0++,R1 


RPTS 

RO 


LDF 

*AR0++,R1 

I I 

STF 

Rl,*AR1++ 


STF 

Rl,*AR1 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


; Perform first 

r 

; ARl > 

' AR2 > 

} 

r AR3 -► 

; AR4 -► 

' ARl 


START: 


LOOPl_2: 


and second FFT loops. 


11 

12 

13 

14 


0 

1 

2 

3 

4 


[X(I1) 

[X(I1) 

[X(I1) 

-[X(I3) 


+ X(I2)] 

- X(I2)] 
+ X(I2)] 

- X(I4)] 


+ [X(I3) 

- [X(I3) 


+ X(I4)] 

+ X(I4)] 


▼ 


LDI 

@DEST_ADDR,ARl 

LDI 

ARl,AR2 

LDI 

ARl,AR3 

LDI 

ARl,AR4 

ADD I 

1, AR2 

ADD I 

2, AR3 

ADD I 

3, AR4 

LDI 

4, IRO 

LDI 

@FFT_SIZE,RC 

LSH 

-2,RC 

SUBI 

2,RC 

LDF 

*AR2,RO 

LDF 

*AR3,R1 

ADDF3 

Rl,*AR4,R4 

SUBF3 

Rl,*AR4++(IRO),R5 

SUBF3 

RO,*AR1,R6 

ADDF3 

RO,*AR1++(IRO),R7 

ADDF3 

R7, R4, R2 

SUBF3 

00 

ft 

RPTB 

L00P1_2 

LDF 

*+AR2(IRO),R0 

LDF 

*+AR3(IRO),Rl 

ADDF3 

Rl,*AR4,R4 

STF 

R3,*AR3++(IRO) 

SUBF3 

Rl,*AR4++(IRO) ,R5 

STF 

R5,*-AR4(IRO) 

SUBF3 

RO,*AR1,R6 

STF 

R6,*AR2++(IRO) 

ADDF3 

RO,*AR1++(IRO) ,R7 

STF 

R2,*-ARl(IRO) 

ADDF3 

R7, R4, R2 

SUBF3 

R4,R7,R3 

STF 

R3,*AR3 

STF 

R5,*-AR4(IRO) 

STF 

R6,*AR2 

STF 

R2,*-ARl(IRO) 


RO 

= X(I2) 



Rl 

= X(I3) 



R4 

= X(I3) + 

X(I4) 


R5 

= -[X(I3) 

- X(I4)]■ 

R6 

= X(I1) - 

X(I2) 

— 

R7 

= X(I1) + 

X(I2) 


R2 

II 

+ 



R3 

= R7 - R4 

— 



X(I3) < 
X(I4) 4 - 
X(I2) 4 - 
X(I1) <- 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


Perform third FFT loop. 
Part A: 


ARl 


AR2 


AR3 

ARl 


II 


12 


13 


14 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 


X(I1) + X(I3) 

X(I1) - X(I3) 

-X(I4) 


L00P3_A: 


LDI 

@DEST_ADDR,ARl 

LDI 

ARl,AR2 

LDI 

ARl,AR3 

ADD I 

4, AR2 

ADD I 

6, AR3 

LDI 

8, IRQ 

LDI 

@EET_SIZE,RC 

LSH 

-3,RC 

SUBI 

2,RC 

SUBF3 

*AR2, ’^ARl, R1 

ADDF3 

*AR2,*AR1,R2 

NEGF 

*AR3,R3 


RPTB 

LOOP3_A 



LDF 

*+AR2(IRQ),RO 

II 

o 

ft 

X(I3) 

STF 

R2,*AR1++(IRQ) 



SUBE3 

RO,*AR1,R1 

; Rl = 

X(I1) 

- X(I3) - 

STF 

Rl,*AR2++(IRO) 

} 


ADDF3 

RO,*AR1,R2 

II 

X(I1) 

+ X(I3) 

STF 

R3,*AR3++(IRO) 

} 


NEGF 

*AR3,R3 

; R3 = 

r 

-X(I4)- 

1 


STF 

R2,*AR1 

; xdl)-* 

1 

1 


STF 

Rl,*AR2 

; x(i3)-< 

1 


STF 

R3,*AR3 

; x(i4 )* 

1 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radlx-2 FFT (Continued) 


Part B: 

ARO 
ARl 
AR2 
AR3 
ARO 



0 

11 

1 ◄- X[I1] + 


2 

12 

3 ◄- X[I1] - 


4 

13 

5 ◄- -X[I2] - 


6 

14 

7 ◄- X[I2] - 


8 


9 NOTE : C( 

▼ 


LDI 

@FFT_SIZE,RC 

LSH 

-3,RC 

LDI 

RC,IRl 

SUBI 

3,RC 

LDI 

8, IRO 

LDI 

@DEST_ADDR,ARO 

LDI 

ARO,ARl 

LDI 

ARO,AR2 

LDI 

ARO,AR3 

ADD I 

1, ARO 

ADD I 

3, ARl 

ADD I 

5, AR2 

ADD I 

7, AR3 

LDI 

@SINE_TABLE,AR7 

LDF 

*++AR7(IRl),R7 

MPYF3 

*AR7,*AR2,R0 

MPYF3 

*AR3,R7,R1 

ADDF3 

R0,R1,R2 

MPYF3 

*AR7,*+AR2(IRO), 

1 SUBF3 

RO, Rl,R3 

SUBF3 

*AR1,R3,R4 

ADDF3 

*AR1,R3,R4 

1 STF 

R4,*AR2++(IRO) 

SUBF3 

R2,*AR0,R4 

1 STF 

R4,*AR3++(IRO) 

ADDF3 

*AR0,R2,R4 

1 STF 

R4,*AR1++(IRO) 

RPTB 

LOOP3_B 

MPYF3 

*AR3,R7,Rl 

1 STF 

R4,*AR0++(IRO) 

ADDF3 

R0,R1,R2 

MPYF3 

*AR7,*+AR2(IRO), 


[X (13) ’^COS- X (14) ^COS] 


RO 


Initialize table pointers. 

R7 = COS(2*pi/8) 

*AR7 = COS(2*pi/8) 

RO = X(13)*COS 

R5 = X(14)*COS 

R2 = [X(I3)*C0S + X(I4)*COS] 

R3 = -[X(I3)*C0S - X(I4)*C0S] 
R4 = -X(I2) + R3 — 

R4 = X(I2) + R3 - 

X{13) < - 

R4 = X(I1) - R2 — 

X{14) ◄- 

R4 = X(I1) + R2 - 

X{12) ◄- 


X{I1) <- 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 

I I 

SUBF3 

RO, Rl,R3 


SUBF3 

*AR1,R3,R4 


ADDF3 

*AR1,R3,R4 

I I 

STF 

R4,*AR2++(IRO) 


SUBF3 

R2,*AR0,R4 

I I 

STF 

R4,*AR3++(IRO) 

L00P3_B: 

ADDF3 

*ARO,R2,R4 

I I 

STF 

R4,*AR1++(IRO) 


MPYF3 

*AR3,R7,R1 

I I 

STF 

R4,*ARO++(IRO) 


ADDF3 

R0,R1,R2 


SUBF3 

RO, Rl,R3 


SUBF3 

*AR1,R3,R4 


ADDF3 

*AR1,R3,R4 

I I 

STF 

R4,*AR2 


SUBF3 

R2, ’^ARO, R4 

I I 

STF 

R4,*AR3 


ADDF3 

*ARO,R2,R4 

I I 

STF 

R4,*AR1 


STF 

R4,*ARO 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


Perform fourth FFT loop. 


Part A: 
ARl 


AR2 


AR3 


ARl 


L00P4_A: 


11 

0 ◄- X{I1) 



1 



2 


3 

12 

4 


5 


6 


7 

13 

8 ◄- X(ll) 


9 


10 


11 

14 

12 ^ -X(14) 


13 


14 


15 

15 

16 


▼ 

17 


LDl 

@DEST_ADDR,ARl 


LDl 

ARl,AR2 


LDl A 

Rl,AR3 


ADD I 

8, AR2 


ADD I 

12,AR3 


LDl 

16,IRO 


LDl 

@FFT_SIZE,RC 


LSH 

-4,RC 


SUBI 

2,RC 


SUBF3 

*AR2,*AR1,R1 


ADDF3 

*AR2,*AR1,R2 


NEGF 

*AR3,R3 


RPTB 

LOOP4_A 


LDF 

*+AR2(IRO),R0 

1 1 

STF 

R2,*AR1++(IRO) 


SUBF3 

RO,*AR1,R1 

1 1 

STF 

Rl,*AR2++(IRO) 


ADDF3 

RO,*AR1,R2 

1 1 

STF 

R3,*AR3++(IRO) 


NEGF 

*AR3,R3 


STF 

R2,*AR1 

1 1 

STF 

Rl,*AR2 


STF 

R3,*AR3 


RO = X(I3) 

R1 = X(I1) - X(I3) - 

R2 = X(I1) + X(13) m 

R3 = -X(I4) - 

X(I1) ◄-1 

X(I3) < -' 

X(I4) ■* -' 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 

} 

; P 

art B: 



} 



0 

r 

ARO -► 

11 (3rd) 

1 ◄- X[ll] + [X(13)*C0S+ X(14)*S1N] 



11 (2nd) 

2 



11 (1st) 

3 




4 

} 


12 (1st) 

5 

} 


12 (2nd) 

6 

r 

ARl > 

12 (3rd) 

7 ^ X[ll] - [X(13)*C0S+ X(14)*S1N] 

} 



8 

r 

AR2 > 

13 (3rd) 

9 ^ -X[12] - [X(13)*C0S- X(14)*C0S] 

} 


13 (2nd) 

10 

r 

AR4 > 

13 (1st) 

11 ◄- 

} 



12 

r 


14 (1st) 

13 

r 


14 (2nd) 

14 

r 

} 

AR3 > 

14 (3rd) 

15 4- X[I2] - [X(I3)*SIN- X(I4)*C0S] 

r 



16 

} 

ARO -► 


17 



▼ 




LDl 

@FFT_S1ZE,RC 



LSH 

-4,RC 



LDl 

RC,IRl 



LDl 

2, IRO 



SUBl 

3,RC 



LDl 

@DEST_ADDR,ARO 



LDl 

ARO,ARl 



LDl 

ARO,AR2 



LDl 

ARO,AR3 



LDl 

ARO,AR4 



ADDl 

1, ARO 



ADDl 

7, ARl 



ADDl 

9, AR2 



ADDl 

15,AR3 



ADDl 

11,AR4 



LDl 

@S1NE_TABLE,AR7 



LDF 

*++AR7(IRl),R7 ; R7 = SIN(1*[2*pi/16] ) 




; *AR7 = COS(3*[2*pi/16] ) 



LDl 

AR7,AR6 



LDF 

*++AR6(IRl),R6 ; R6 = SIN(2*[2*pi/16] ) 




; *AR6 = COS(2*[2*pi/16]) 



LDl 

AR6,AR5 



LDF 

*++AR5(IRl),R5 ; R5 = SIN(3*[2*pi/l6]) 




; *AR5 = COS(1*[2*pi/16]) 



LDl 

16,IRl 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radlx-2 FFT (Continued) 


MPYF3 

*AR7,*AR4,R0 

r 

RO = 

MPYF3 

*++AR2(IRO),R5,R4 

} 

R4 = 

MPYF3 

*--AR3(IRO),R5,R1 

r 

R1 = 

MPYF3 

*AR7, ’^ARS, RO 

} 

RO = 

ADDF3 

R0,R1,R2 

r 

R2 = 

MPYF3 

*AR6, ’^-AR4, RO 



SUBF3 

00 

o 

p^ 

} 

R3 = 

SUBF3 

*--ARl(IRO) ,R3,R4 

f 

R4 = 

ADDF3 

^ARl,R3,R4 

} 

R4 = 

STF 

R4,*AR2-- 

r 

X(I3) 

SUBF3 

R2,*++ARO(IRO),R4 

} 

R4 = 

STF 

R4,*AR3 

f 

X(I4) 

ADD 

F3 ’^AR0,R2,R4 

} 

R4 = 

STF 

R4,*AR1 

f 

X(I2) 

MPYF3 

*++AR3,R6,Rl 

} 

r 


STF 

R4,*ARO 

} 

X(I1) 


ADDF3 R0,R1,R2 

MPYF3 *AR5,*-AR4(IRO),RO 

SUBF3 R0,R1,R3 

SUBF3 *++ARl,R3,R4 

ADDF3 *AR1,R3,R4 

STF R4,*AR2 

SUBF3 R2 , -ARO, R4 

STF 

STF R4,*AR1 

MPYF3 ^--AR2,R7,R4 

STF R4,*AR0 

MPYF3 *++AR3,R7,Rl 

MPYF3 *AR5, ’^ARS, RO 

ADDF3 R0,R1,R2 

MPYF3 *AR7 , ’^++AR4 (IRl) , RO 

SUBF3 R4,R0,R3 

SUBF3 *++ARl,R3,R4 

ADDF3 *AR1,R3,R4 

STF R4,*AR2++(IRl) 

SUBF3 R2, *--AR0,R4 

STF R4,*AR3++(IRl) 

ADDF3 *AR0,R2,R4 

STF R4,*AR1++(IRl) 

RPTB LOOP4_B 

MPYF3 *++AR2(IRO),R5,R4 

STF R4,*ARO++(IRl) 

MPYF3 *--AR3(IRO),R5,R1 

MPYF3 *AR7,*AR3,RO 

ADDF3 R0,R1,R2 

MPYF3 *AR6,*-AR4,RO 

SUBF3 R4,R0,R3 

SUBF3 *--ARl(IRO),R3,R4 

ADDF3 *AR1,R3,R4 


X(13)*COS (3) 

X(I3)*SIN(3) 

X(14)*SIN(3) 

X(14)^COS (3) 

[X(13)*COS + X(14)*SIN] 

■ [X (13) ’^SIN - X (14) *COS] 
-X(I2) + R3 - 
X(I2) + R3 - 

◄- 

X(I1) - R2 - 

◄- 

X(I1) + R2 - 

◄- ^ - 


<- 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


11 

STF 

R4,*AR2— 


SUBF3 

R2,*++ARO(IRQ),R4 

11 

STF 

R4,*AR3 


ADDF3 

*AR0,R2,R4 

11 

STF 

R4, ’^ARl 


MPYF3 

*++AR3,R6,Rl 

11 

STF 

R4,*ARO 


ADDF3 

R0,R1,R2 


MPYF3 

^AR5, ’^-AR4 (IRO) , RO 

11 

SUBF3 

R0,R1,R3 


SUBF3 

*++ARl,R3,R4 


ADDF3 

*AR1,R3,R4 

11 

STF 

R4,*AR2 


SUBF3 

R2, *--AR0,R4 

11 

STF 

R4,*AR3 


ADDF3 

*AR0,R2,R4 

11 

STF 

R4,*AR1 


MPYF3 

*--AR2,R7, R4 

11 

STF 

R4,*ARO 


MPYF3 

*++AR3,R7,Rl 


MPYF3 

*AR5,*AR3,R0 

11 

ADDF3 

RO, Rl, R2 


MPYF3 

^AR7,*++AR4(IRl),RO 

11 

SUBF3 

R4, RO,R3 


SUBF3 

*++ARl,R3,R4 


ADDF3 

*AR1,R3,R4 

11 

STF 

R4,*AR2++(IRl) 


SUBF3 

R2, *--ARO,R4 

11 

STF 

R4,*AR3++(IRl) 

L00P4_B: 

ADDF3 

*ARO,R2,R4 

I I 

STF 

R4,*AR1++(IRl) 


MPYF3 

*++AR2(IRO),R5,R4 

I I 

STF 

R4,*ARO++(IRl) 


MPYF3 

*--AR3(IRO),R5,R1 


MPYF3 

*AR7,*AR3,R0 

I I 

ADDF3 

R0,R1,R2 


MPYF3 

*AR6,*-AR4,RO 

I I 

SUBF3 

R4,R0,R3 


SUBF3 

*--ARl(IRO),R3,R4 


ADDF3 

*AR1,R3,R4 

I I 

STF 

R4,*AR2-- 


SUBF3 

R2,*++ARO(IRO) ,R4 

I I 

STF 

R4,*AR3 


ADDF3 

*AR0,R2,R4 

I I 

STF 

R4,*AR1 


MPYF3 

*++AR3,R6,Rl 

I I 

STF 

R4,*ARO 


ADDF3 

RO, Rl, R2 


MPYF3 

*AR5,*-AR4(IRO),RO 


6-56 















Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 

I I 

SUBF3 

R0,R1,R3 


SUBF3 

*++ARl,R3,R4 


ADDF3 

’^ARl, R3, R4 

I I 

STF 

R4,*AR2 


SUBF3 

R2, ’^--ARO, R4 

I I 

STF 

R4,*AR3 


ADDF3 

’^ARO, R2, R4 


STF 

R4,*AR1 


MPYF3 

’^--AR2, R7, R4 

I I 

STF 

R4,*ARO 


MPYF3 

’^T+ARS, R7, R1 


MPYF3 

*AR5,*AR3,R0 

I I 

ADDF3 

RO, Rl, R2 


SUBF3 

R4, RO, R3 


SUBF3 

*++ARl,R3,R4 


ADDF3 

*AR1,R3,R4 

I I 

STF 

R4,*AR2 


SUBF3 

R2, *--ARO,R4 

I I 

STF 

R4,*AR3 


ADDF3 

*ARO,R2,R4 


STF 

R4,*AR1 


STF 

R4,*ARO 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 


} 

; Perform remaining FFT loops (loop 4 onwards). 

r 



LOOP 

r 



1st 2nd 

r 




} 


X'(11) 

0 0 < X' (11) + X' (13) 

r 

ARl > 

X(I1) (1st) 

11-^ X(I1) + [X(I3)*C0S+ X(I4)*SIN] 

} 


X(I1) (2nd) 

2 2 



X(I1) (3rd) 

3 3 

r 

} 

r 

A > 



} 


X' (12) 

8 16 

r 

f 

} 

B > 



r 


X(I2) (3rd) 

13 29 

} 


X(I2) (2nd) 

14 30 

r 

AR2> 

X(I2) (1st) 

15 31 <- X[I1] - [X(I3)’^C0S+ X(I4)*SIN] 

} 


X' (13) 

16 32 ◄- X' (11) - X' (13) 

r 


X(I3) (1st) 

17 33 < -X[I2]- [X(I3)*SIN- X(I4)*COS] 

r 


X(I3) (2nd) 

18 34 

r 


X(I3) (3rd) 

19 35 

r 

} 

} 

c > 



r 


X'(14) 

24 48 ^ -X' (14) 

} 

} 

} 

D > 



r 


X(I4) (3rd) 

29 61 ◄- 

r 


X(I4) (2nd) 

30 62 

r 

AR4> 

X(I4) (1st) 

31 63 X[I2] - [X(I3)*SIN- X(I4)*C0S] 

} 



32 64 

' 

AR1> 


33 65 



1 

1 

▼ 




LDI 

@FFT_SIZE,IRO 



LSH 

-2,IRO 



STI I 

RO,@SEPARATION 



LSH 

-2,IRO 



LDI 

5 , R5 



LDI 

3,R7 



LDI 

16,R6 



LDI 

@DEST_ADDR,AR5 



LDI 

@DEST_ADDR,ARl 



LSH 

-1,IRO 



LSH 

1,R7 
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Fast Fourier Transforms (FFTs) 


Example 6- 


LOOP : 


INLOP: 


16. Real Forward Radlx-2 FFT (Continued) 



ADD I 

1,R7 



LSH 

1,R6 



LDI 

ARl,AR4 



ADD I 

R7,ARl 



LDI A 

Rl,AR2 



ADD I 

2, AR2 



ADD I 

R6,AR4 



SUBI 

R7,AR4 



LDI 

AR4,AR3 



SUBI 

2, AR3 



LDI 

@SINE_TABLE 

, ARO 


LDI 

R7, IRl 



LDI 

R7, RC 



ADDE3 

—ARl (IRl) , 

*++AR2(IRl 


SUBE3 

—AR3 (IRl) , 

*AR1++,R1 


NEGE 

*—AR4,R2 


11 

STE 

RO,*-ARl 



STF 

Rl,*AR2-- 


11 

STE 

R2,*AR4++(IRl) 


LDI 

@SEPARATION 

, IRl 


SUBI 

3,RC 



MPYF3 

*++AR0(IRQ) 

,*AR4,R4 


MPYF3 

*AR0,*++AR3 

,R1 


MPYF3 

*++AR0(IRl) 

,*AR4,R0 


MPYF3 

’^ARO, *AR3, RO 

11 

SUBF3 

R1,R0,R3 



MPYF3 

*++AR0(IRQ) 

,*-AR4,RO 

11 

ADDF3 

R0,R4,R2 



SUBF3 

*AR2,R3,R4 



ADDF3 

*AR2,R3,R4 


11 

STF 

R4,*AR3++ 



SUBF3 

R2,*AR1,R4 


11 

STF 

R4,*AR4-- 



ADDF3 

*AR1,R2,R4 


11 

STF 

R4,*AR2-- 



RPTB 

IN_BLK 



LDF 

*-AR0(IRl) , 

R3 


MPYF3 

*AR4,R3,R4 


11 

STF 

R4,*AR1++ 



MPYF3 

*AR3,R3,R1 



MPYF3 

*AR0,*AR3,RO 

11 

SUBF3 

R1,R0,R3 



MPYF3 

*++AR0(IRQ) 

,*-AR4,RO 

11 

ADDF3 

RO, R4, R2 



SUBF3 

*AR2,R3,R4 



ADDF3 

*AR2,R3,R4 


11 

STF 

R4,*AR3++ 



SUBF3 

R2,*AR1,R4 


11 

STF 

R4,*AR4-- 



ARl 

points 

at 

A. 

AR2 

points 

at 

B. 

AR4 

points 

at 

D . 

AR3 

points 

at 

C. 

ARO 

points 

at 

SIN/COS table 


RO ; RO = X' (11) + X' ( 
; R1 = X' (11) - X' ( 
; R2 = -X' (14) - 

H H 

GO GO 

-1 1 

; X (li) ^ 


; X'(13) ◄ 

. V A / T /I \ ^ - 



; X' (14) < 


; IRl=SEPARATION 
BETWEEN SIN/COS TBLS 
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Fast Fourier Transforms (FFTs) 


Example 6-16. Real Forward Radix-2 FFT (Continued) 

IN_BLK: ADDF3 

*AR1,R2,R4 


I I STF 

R4,*AR2-- 


LDF 

*-AR0(IRl) ,R3 


MPYF3 

’^AR4, R3, R4 


I I STF 

R4,*AR1++ 


MPYF3 

*AR3,R3,R1 


MPYF3 

*AR0,*AR3,R0 


I I SUBF3 

R1,R0,R3 


LDI 

R6,IRl 


ADDF3 

R0,R4,R2 


SUBF3 

*AR2,R3,R4 


ADDF3 

*AR2,R3,R4 


I I STF 

R4,*AR3++(IRl) 


SUBF3 

R2,*AR1,R4 


I I STF 

R4,*AR4++(IRl) 


ADDF3 

*AR1,R2,R4 


I I STF 

R4,*AR2++(IRl) 


STF 

R4,*AR1++(IRl) 


SUBI3 

AR5,AR1,R0 


CMP I 

@EET_SIZE,R0 


BLTD 

INLOP 

; LOOP BACK TO THE 

INNER LOOP 

LDI 

@SINE_TABLE,ARO 

; ARO POINTS TO 

SIN/COS TABLE 

LDI 

R7,IRl 


LDI 

R7, RC 


ADD I 

1,R5 


CMP I 

@LOG_SIZE,R5 


BLED 

LOOP 


LDI 

@DEST_ADDR,ARl 


LSH 

-1,IRO 


LSH 

1,R7 

; Return to C environment. 

POP 

DP 

} 

; Restore C environment 
; variables. 

POP 

AR7 


POP 

AR6 


POP 

AR5 


POP 

AR4 


POPF 

R7 


POP 

R7 


POPF 

R6 


POP 

R6 


POP 

R5 


POP 

R4 


POP 

RETS 

. end 

EP 
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Fast Fourier Transforms (FFTs) 


Example 6-17 shows the implementation of a radix-2 real Inverse FFT. The in¬ 
verse transformation assumes that the input data is in the same order as the 
output of the forward transformation. It also produces a time signal in the proper 
order. In other words, bit reversing takes place at the end of the program. 

Example 6-17. Real Inverse Radix-2 FFT 


* Real Inverse FFT 

iririririririririririririririririririririririririririr-k'k-k'k'k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k'k'k-k'k-k'k-k'k-k'k'k'k-k'k-k'k'k'k-k'k-k'k-k'k-k'k-k'k'k'k 

* FILENAME : Ifft_rl.asm 

* WRITTEN BY : Daniel Mazzocco 

^ Texas Instruments, Houston 

* DATE : 18th Feb 1992 

^ VERSION : 1.0 

COMMENTS 


VER 

1.0 


DATE 

18th Eeb 92 


Original release. Started from forward real EET 
routine written by Alex Tessarolo, rev 2.0 . 


'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k 


* SYNOPSIS: 

int 

ifft_rl( EET_ 

_SIZE, LOG_SIZE, SOURCE_ADDR, 

-k 


DEST_ADDR, SINE_TABLE, BIT_REVERSE ); 

k: 

int 

EET_SIZE 

; 64, 128, 256, 512, 1024, ... 

■k 

int 

LOG_SIZE 

; 6, 7, 8, 9, 10, ... 

k: 

float 

*SOURCE_ADDR 

; Points to where data is originated 

■k 



; and operated on. 

k: 

float 

*DEST_ADDR 

; Points to where data will be stored. 

■k 

float 

*SINE_TABLE 

; Points to the SIN/COS table. 

■k 

int 

BIT_REVERSE 

; = 0, bit reversing is disabled. 

■k 

-k 



; <> 0, bit reversing is enabled. 

k: 

NOTE : 

1) If SOURCE. 

_ADDR = DEST_ADDR, then in place bit 

■k 


reversing 

is performed, if enabled (more 

kr 


processor 

intensive). 

■k 


2) EET_SIZE must be >= 64 (this is not checked). 
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Fast Fourier Transforms (FFTs) 


Example 6- 

17. Real Inverse Radix-2 FFT (Continued) 


^ DESCRIPTION: Generic function to do an inverse radix-2 EFT computation 

-k 

on the C30. 




The data array is 

FFT_SIZE long 

with real and imaginary 

■k 

■k 

points R and I as 

follows: 


:k 

SOURCE_ADDR[0] 

>R(0) 


:k 


R(l) 


:k 


R(2) 


■k 

:k 


R(3) 


■k 

:k 


R (FFT_ 

_SIZE/2) 

■k 

:k 


I (FFT_ 

_SIZE/2 - 1) 

■k 

:k 


I (2) 


■k 

■k 

S0URCE_ADDR[FFT_SIZE-1] >1(1) 


■k 

The output data array will contain only real values. 

:k 

Bit reversal is optionally implemented at the end 

:k 

■k 

of the function. 




The sine/cosine table for the twiddle factors is expected 

■k 

to be supplied in 

the following 

format: 

■k 

SINE_TABLE[0] 

> sin (O' 

^2*pi/FFT_SIZE) 

:k 

■k 


sin (1' 

^2*pi/FFT_SIZE) 

■k 


sin((FFT_SIZE/2-2)*2*pi/FFT_SIZE) 

■k 

SINE_TABLE [FFT_SIZE/2-l] s >in( (FFT_SIZE/2-l)’^2’^pi/FFT_SIZE) 

-k 

NOTE: The table is 

the first half period of a sine wave. 

•k 

■k 

Stack structure upon call: 



-FP (7) 

BIT_REVERSE 


■k 

-FP(6) 

SINE_TABLE 


:k 

-FP(5) 

DEST_ADDR 


■k 

-FP(4) 

SOURCE_ADDR 



-FP(3) 

LOG_SIZE 


■k 

-FP(2) 

FFT_SIZE 



-FP(1) 

returne 


•k 

-FP (0) 

addr 


:k 


old FP 


•k 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


* NOTE: Calling C program can be compiled using either large 

* or small model. 

* WARNING: DP initialized only once in the program. Be wary 

* with interrupt service routines. Make sure interrupt 

* service routines save the DP pointer. 

* WARNING: The SOURCE_ADDR must be aligned such that the first 

* LOG_SiZE bits are zero (this is not checked by the 

* program). 

■k 

* REGISTERS USED: RO, Rl, R2, R3, R4, R5, R6, R7 

* ARO, ARl, AR2, AR3, AR4, AR5, AR6, AR7 

* IRO, IRl 

* RC, RS, RE 

* DP 


* MEMORY REQUIREMENTS: 


Program = 322 words (approximately) 

Data = 7 words 

Stack = 12 words 


-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 


* BENCHMARKS: 


Assumptions - Program in RAMO 

- Reserved data in RAMO 

- Stack on primary/expansion bus RAM 

- Sine/cosine tables in RAMO 

- Processing and data destination in RAMI 

- Primary/expansion bus RAM, 0 wait state 


Bit Reversing 


Data Source 


Cycles(C30) 


* EET Size 

* _ 

* 1024 OEE RAMI 25892 approx. 

* Note: This number does not include the C callable overheads. 

* Add 57 cycles for these overheads. 

k:k:kkk::k:kkkk:kkk:k:kkk:k:k:kk:k:kk:kk:kk:kk:kkk:k:kkk::kk:k:kk:kkk:k:kk:k:k:k:kk:k:kkk:k::kkk:k:kkk:k:kkk:k:kk:kk:kk:k 


; Entry execution point. 

; Reserve memory for arguments. 


EP 

.set AR3 



.global 

_ifft_: 

EFT_SiZE: 

.usect " 

. ifftdata" 

LOG_SiZE: 

.usect " 

. ifftdata" 

SOURCE_ADDR: 

.usect " 

. if ftdata'" 

DEST_ADDR: 

.usect " 

. ifftdata" 

SiNE_TABLE: 

.usect " 

. if ftdata'" 

BiT_REVERSE: 

.usect " 

. ifftdata" 

SEPARATION: 

.usect " 

. if ftdata'" 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 



} 

; Initialize C Function. 

. sect 

".iffttext" 


PUSH 

FP 

; Preserve C environment. 

LDI 

SP,FP 


PUSH 

R4 


PUSH 

R5 


PUSH 

R6 


PUSHF 

R6 


PUSH 

R7 


PUSHF 

R7 


PUSH 

AR4 


PUSH 

AR5 


PUSH 

AR6 


PUSH 

AR7 


PUSH 

DP 


LDP 

FFT_SIZE 

; Initialize DP pointer. 

LDI 

*-FP(2),R0 

; Move arguments from stack. 

STI 

RO,@FFT_SIZE 


LDI 

*-EP(3),RO 


STI 

RO,@LOG_SIZE 


LDI 

*-EP(4),RO 


STI 

RO,@SOURCE_ADDR 


LDI 

*-EP(5),RO 


STI 

RO,@DEST_ADDR 


LDI 

*-EP(6),RO 


STI 

RO,@SINE_TABLE 


LDI 

*-FP (7) ,RO 


STI 

RO,@BIT_REVERSE 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


r 

; Perform last FFT 

loops first 

(loop 2 

onwards). 

} 





LOOP 



r 





1st 

2nd 



r 









r 



X' 

(11) 

0 

0 

◄- 

X' (11) + X' (13) 

} 

ARl -► 


X(I1) 

(1st) 

1 

1 

◄- 

X(I1) + [X(I2) 




X(I1) 

(2nd) 

2 

2 






X(I1) 

(3rd) 

3 

3 



r 

} 

} 

A > 








r 



X' 

(12) 

8 

16 

◄- 

X' (12) * 2 

r 

} 

r 

B > 








} 



X(I2) 

(3rd) 

13 

29 



r 



X(I2) 

(2nd) 

14 

30 



f 

AR2 > 


X(I2) 

(1st) 

15 

31 

◄- 

X[I4] - [X(I3) 




X' 

(13) 

16 

32 

◄- 

X' (11)- X' (13) 

r 

AR3 > 


X(I3) 

(1st) 

17 

33 

◄- 

[X(11)-X(12)]*COS-[X(I3)+X(I4)]*SIN 

r 



X(I3) 

(2nd) 

18 

34 



r 



X(I3) 

(3rd) 

19 

35 



} 

} 

r 

c -► 



- 





} 



X' 

ITT) 

24 

48 

◄- 

-X' (14) *2 

r 

} 

r 

D -► 








} 



X(I4) 

(3rd) 

29 

61 



r 



X(I4) 

(2nd) 

30 

62 



r 

AR4 -► 


X(I4) 

(1st) 

31 

63 

◄- 

[X(I2)-X(I2)]*SIN+[X(I3)+X(I4)]*COS 

r 




32 

64 



r 

ARl -► 



33 

65 






LDI 

^ 1,11 

^0 



; Step between two consecutive sines 



LDI 

4,R5 



; Stage number from 4 to M. 



LDI 

@FFT_SIZE,R7 





LSH 

-2,R7 



; R7 is FFT_SIZE/4-l (ie 15 for 64 pts) 



SUBI 

1,R7 



; and will be used to point at A & D. 



LDI 

@FFT_SIZE,R6 


; R6 will be used to point at D. 



LSH 

1,R6 






LDI 

@SOURCE_ 

ADDR, 

AR5 




LDI 

@SOURCE_ 

ADDR, 

ARl 


LOOP 


LSH 

-1,R6 



; R6 is EET_SIZE at the 1st loop. 



LDI 

ARl, 

, AR4 






ADD I 

R7,ARl 



; ARl points at A. 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


INLOP: 


LDI 

ARl,AR2 

ADD I 

2, AR2 

ADD I 

R6,AR4 

SUBI 

R7,AR4 ; AR4 points 

LDI 

AR4,AR3 

SUBI 

2,AR3 ; AR3 points 

LDI 

R7,IRl 

LDI 

R7,RC 

ADDF3 

’^--ARl (IRl) , 

--AR3(IRl) ,R0 

SUBF3 

*AR3, ’^ARl, R1 

LDF 

*--AR4,R2 

STF 

RO,*AR1++ 

MPYF 

-2.0,R2 

LDF 

*--AR2,R3 

STFR 

1,*AR3++ 

MPYF 

2.0,R3 

STF 

R3,*AR2++(IRl) 

STF 

R2,*AR4++(IRl) 

LDI 

@FFT_SIZE,IRl 

LDI 

@SINE_TABLE, ARO 

LSH 

-2,IRl 

SUBI 

3,RC 

SUBF3 

*AR2,*AR1,R3 

ADDF3 

’^ARl, *AR2, R2 

MPYF 3 

R3,*++AR0(IRO),R1 

LDF 

*AR4,R4 

MPYF 3 

R3,*++AR0(IRl),R0 

SUBF3 

’^ARS, R4, R3 

ADDF3 

R4,*AR3,R2 

STF 

R2,*AR1++ 

MPYF 3 

R2,*AR0--(IRl),R4 

STF 

R3,*AR2— 

ADDF3 

R4, Rl,R3 

MPYF 3 

R2, *AR0,R1 

STF 

R3,*AR4-- 

SUBF3 

R1,R0,R4 

RPTB 

IN_BLK 


; AR2 points at B. 
at D. 
at C. 



; R3 = X(II)-X (12) 

; R2 = X (II) +X (12) - 

; R1 = RS^SIN 

; R4 = X (14) 

; RO = R3*COS 

; R3 = X(14)-X (13) - 

; R2 = X(13)+X(14) 

; X(I1) ◄- 

; R4 = R2*COS 

; X{I2) ◄- 

; R3 = R3*SIN + R2*COS 

; R1 = R2^SIN 

; X(I4) ◄- 

; R4 = R3*COS - R2*SIN 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radlx-2 FFT (Continued) 


IN_BLK: 



SUBF3 

*AR2,*AR1,R3 



ADDF3 

*AR1,*AR2,R2 



MPYF3 

R3, *++AR0 (IRO) , 

R1 

11 

STF 

R4,*AR3++ 



LDF 

*AR4,R4 



MPYF3 

R3, *++AR0 (IRl) , 

RO 

11 

SUBF3 

*AR3,R4,R3 



ADDF3 

R4,*AR3,R2 


11 

STF 

R2, ’^AR1++ 



MPYF3 

R2,*AR0--(IRl) 

,R4 

11 

STF 

R3,*AR2-- 



ADDF3 

R4, Rl,R3 



MPYF3 

R2, ’^ARO, R1 


11 

STF 

R3, ’^ARi-- 



SUBF3 

R1,R0,R4 



SUBF3 

*AR2,*AR1,R3 



ADDF3 

*AR1,*AR2,R2 



MPYF3 

R3,*++AR0(IRO) 

,R1 

11 

STF 

R4,*AR3++ 



LDF 

*AR4,R4 



MPYF3 

R3,*++AR0(IRl) 

,R0 

11 

SUBF3 

*AR3,R4,R3 



ADDF3 

R4,*AR3,R2 


11 

STF 

R2, ’^ARl 



MPYF3 

R2,*AR0--(IRl) 

,R4 

11 

STF 

R3,*AR2 



LDI 

R6,IRl 



ADDF3 

R4, Rl,R3 



MPYF3 

R2, *AR0,R1 


11 

STF 

R3,*AR4++(IRl) 



SUBF3 

Rl, RO,R4 



NEGF 

*AR1++(IRl) ,R2 


11 

STF 

R4,*AR3++(IRl) 



SUBI3 

AR5,AR1,R0 



CMP I 

@FFT_SIZE,RO 



BLTD 

INLOP 



NOP 

*AR2++(IRl) 



LDI 

R7,IRl 



LDI 

R7, RC 



ADD I 

1,R5 



CMP I 

@LOG_SIZE,R5 



BLED 

LOOP 



LDI 

@ SOURCE_ADDR, ARl 


LSH 

1, IRO 



LSH 

-1,R7 



R3 = X(II)-X (12) 

R2 = X(I1)+X(I2) - 

R1 = RS’^SIN 
X(I3) 

R4 = X (14) 

RO = R3*COS 

R3 = X(14)-X (13)- 

R2 = X(13)+X(14) 

X(I1) ◄- 

R4 = R2*COS 

X(I2) ◄- 

R3 = R3’^SIN + R2’^COS - 

R1 = R2*SIN 

X(I4) ◄- 

R4 = R3*COS - R2*SIN 

R3 = X(II)-X(12) 

R2 = X(I1)+X(I2) - 

R1 = R3*SIN 
X(I3) 

R4 = X (14) 

RO = R3*COS 

R3 = X(I4)-X(I3) - 

R2 = X(13)+X(14) 

X(I1) ◄- 

R4 = R2*COS 

X(I2) ◄- 

Get prepared for the next 

R3 = R3*SIN + R2*COS - 

R1 = R2*SIN 

X(I4) - 

R4 = R3*COS - R2*SIN 

Dummy 

X(I3) 


Loop back to the inner loop 
Dummy 


Next stage if any left 


Double step in sinus table 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


; Perform third FFT loop. 


; Part A: 

; ARl 

} 

r 

' AR2 

f 

} 

' AR3 

} 

f 

; AR3 

} 

r _ 

; ARl 


L00P3_A: 


II 


12 


13 


14 


0 


X 

2 

◄- 

2 

4 

◄- 

X 

6 

◄- 

-2 

8 

1_1 


(II) + X(I3) 
1 

* X(I2) 

3 

(II) - X(I3) 
5 

* X(I4) 

7 


9 


▼ 


LDI 

@ SOURCE_ADDR,ARl 

LDI 

ARl,AR2 

LDI 

ARl,AR3 

LDI 

ARl,AR4 

ADD I 

2, AR2 

ADD I 

4, AR3 

ADD I 

6, AR4 

LDI 

8, IRQ 

LDI 

@FFT_SIZE,RC 

LSH 

-3,RC 

SUBI 

1,RC 

LDI 

@SINE_TABLE, ARO 

RPTB 

LOOP3_A 

LDF 

*AR3,R3 

ADDF3 

R3,*AR1,RO 

SUBF3 

R3,*AR1,R1 

LDF 

*AR4,R2 

STF 

RO,*AR1++(IRO) 

MPYF 

-2.0, R2 

LDF 

*AR2,R3 

STF 

Rl,*AR3++(IRO) 

MPYF 

2.0,R3 

STF 

R3,*AR2++(IRO) 

STF 

R2,*AR4++(IRO) 


; ARO points at SIN/COS table. 


; RO = X' (II) + X' (13) - 

; R1 = X' (II) - X' (13) - 

; X' (II) ◄-- 

; R2 = -2*X' (14) 

i X' (13) < - 

; R3 = 2*X' (12) -1 

; X' (12) ◄-^-1 

; X'(14) ◄- 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


} 

r 

Part B: 






0 


ARl -► 

11 

1 ◄- X{I1) + X(I2) 

} 



2 

} 

AR2 > 

12 

3 ◄- X(I1) - X(I3) 

} 



4 

r 

AR3 > 

13 

5 ◄- [X(11)- X (12) ]*COS- [X(13)+ X (14) ]*SIN 

} 



6 

r 

AR4 > 

14 

7 <- [X(11)- X (12) ]*SIN+ [X(13)+ X(14) ]*COS] 

} 



8 

r 

_ ARl > 


9 NOTE: COS(2*pi/8) = SIN(2*pi/8) 

} 

r 

} 


▼ 




LDI 

@ SOURCE_ADDR, ARl 



LDI A 

Rl,AR2 



LDI A 

Rl,AR3 



LDI A 

Rl,AR4 



ADD I 

1, ARl 



ADD I 

3, AR2 



ADD I 

5, AR3 



ADD I 

7, AR4 



LDI 

@SINE_TABLE,AR7 ; AR7 points at SIN/COS table. 



LDI 

@FET_SIZE,RC 



LSH 

-3,RC 



LDI 

RC,IRl 



SUBI 

2,RC 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radlx-2 EFT (Continued) 


L00P3_B: 



LDF 

*AR2,R6 

r 

R6 = 


LDF 

*AR3,R0 

r 

RO = 


ADDF3 

R6,*AR1,R5 

} 

R5 = 


SUBF3 

R6,*AR1,R4 

r 

R4 = 


SUBF3 

RO, R4,R3 

} 

R3 = 


ADDF3 

R0,R4,R2 

r 

R2 = 


SUBF3 

RO, *AR4,R1 

} 

R1 = 

11 

STF 

R5,*AR1++(IRO) 

r 

X (11 


ADDF3 

R2,*AR4,R5 

} 

R5 = 

11 

STF 

Rl,*AR2++(IRO) 

r 

X (12 


MPYF3 

R5,*++AR7(IRl),R1 

f 

R1 = 

11 

SUBF3 

*AR4,R3,R2 

} 

R2 = 


MPYF3 

R2,*AR7,RO 

} 

RO = 

11 

STF 

Rl,*AR4++(IRO) 

} 

X (14 


RPTB 

L00P3_B 




LDF 

*AR2,R6 

} 

R6 = 

11 

STF 

RO,*AR3++(IRO) 

r 

X (13 


ADDF3 

R6,*AR1,R5 

} 

R5 = 


LDF 

*AR3,R0 

f 

RO = 


SUBF3 

R6,*AR1,R4 

} 

R4 = 


SUBF3 

R0,R4,R3 

r 

R3 = 


ADDF3 

R0,R4,R2 

f 

R2 = 


SUBF3 

RO,*AR4,R1 

f 

R1 = 

11 

STF 

R5,*AR1++(IRO) 

} 

X (11 


ADDF3 

R2,*AR4,R5 

} 

R5 = 

11 

STF 

Rl,*AR2++(IRO) 

r 

X (12 


MPYF3 

R5,*AR7,R1 

} 

R1 = 

11 

SUBF3 

*AR4,R3,R2 

r 

R2 = 


MPYF3 

R2,*AR7,RO 

} 

RO = 

11 

STF 

Rl,*AR4++(IRO) 

f 

X (14 


STF 

RO,*AR3 

} 

X(I3 


X(I2) - 

X(I3) 

X(II)+X(12) 

X(II)-X (12) 

X(II)-X(12)-X (13) 

X(II)-X(12)+X (13) 

X(I4)-X(I3) - 

◄- ^ - 

X(I1)-X(I2)+X(I3)+X(I4) 

◄- 

R5*SIN - 

X(II)-X(12)-X(13)-X (14) 

R2*SIN - 

◄- 


X(I2) 

◄- 

X(II)+X(12) 

X(I3) 

X(I1)-X(I2) - 

X(II)-X(12)-X (13) 

X(II)-X(12)+X (13) 

X(I4)-X(I3) - 

◄- ^ - 

X(I1)-X(I2)+X(I3)+X(I4) 

◄- 

RS^SIN ◄- 

X(II)-X(12)-X(13)-X(14) 
R2*SIN 

◄- 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


} 

; Pe 

rform first and second FFT loops. 


ARl > 

11 

0 X{I1) + X(I3) + 2*X{I2) 

r 

AR2 > 

12 

1 ◄- X{I1) + X(I3) - 2*X{I2) 

} 

AR3 > 

13 

2 ◄- X(I1) - X(I3) - 2*X(I4) 

} 

AR4 > 

14 

3 ◄- X(I1) - X(I3) + 2*X(I4) 

} 

ARl -► 


4 

r 

} 

r 


▼ 




LDI 

@ SOURCE_ADDR,ARl 



LDI 

ARl,AR2 



LDI 

ARl,AR3 



LDI 

ARl,AR4 



ADD I 

1, AR2 



ADD I 

2, AR3 



ADD I 

3, AR4 



LDI 

4, IRO 



LDI 

@FFT_SIZE,RC 



LSH 

-2,RC 



SUBI 

2,RC 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radlx-2 FFT (Continued) 


LDF 

*AR4,R6 

; R6 = 

LDF 

*AR2,R7 

; R7 = 

1 1 LDF 

^ARl,R1 

; Ri = 

MPYF 

2.0,R6 

; R6 = 

MPYF 

2.0,R7 

; R7 = 

SUBF3 

R6,*AR3,R5 

; R5 = 

SUBF3 

R5, Rl,R4 

; R4 = 

SUBF3 

R7, *AR3,R5 

; R5 = 

1 1 STF 

R4,*AR4++(IRO) 

; X(I4) 

ADDF3 

R5,R1,R3 

; R3 = 

ADDF3 

R6,*AR3,R4 

; R4 = 

1 1 STF 

R3,*AR2++(IRO) 

; X(I2) 

SUBF3 

R4, Rl, R4 

; R4 = 

ADDF3 

R7,*AR3,RO 

; RO = 

1 1 STF 

R4,*AR3++(IRO) 

; X(I3) 

ADDF3 

RO, Rl,RO 

; RO = 

RPTB 

LOOPl_2 

r 

} 

LDF 

*AR4,R6 

; R6 = 

1 1 STF 

RO,*AR1++(IRO) 

; X(I1) 

MPYF 

2.0,R6 

; R6 = 

LDF 

*AR2,R7 

; R7 = 

1 1 LDF 

*AR1,R1 

; Ri = 

MPYF 

2.0,R7 

; R7 = 

SUBF3 

R6,*AR3,R5 

; R5 = 

SUBF3 

R5, Rl,R4 

; R4 = 

SUBF3 

R7, *AR3,R5 

; R5 = 

1 1 STF 

R4,*AR4++(IRO) 

; X(I4) 

ADDF3 

R5, Rl,R3 

; R3 = 

ADDF3 

R6, *AR3,R4 

; R4 = 

1 1 STF 

R3,*AR2++(IRO) 

; X(I2) 

SUBF3 

R4,R1,R4 

; R4 = 

ADDF3 

R7,*AR3,RO 

; RO = 

1 1 STF 

R4,*AR3++(IRO) 

; X(I3) 

L00P1_2: ADDF3 

o 

h-' 

o 

; RO = 

STF 

RO,*AR1 

r 

; LAST 


X(I4) 

X(I2) 

X(I1) 

2 * X(I4) 

2 ^ X(I2) 

X(I3) - 2*X(I4) - 

X(II)-X(13)+2X(14) 

X(I3) - 2*X(I2) 

◄-^^- 

X(II) +X(13)-2X (12) - 

X(I3) + 2*X(I4) 

◄ - ^^- 

X(II) -X(13)-2X (14) - 

X(I3) + 2^X(I2) 

◄ - ^^- 

X(II) +X(13)+2X (12) - 

X(I4) 

◄- 

2 * X(I4) 

X(I2) 

X(I1) 

2 * X(I2) 

X(I3) - 2*X(I4) 

X(II) -X(13)+2X (14) - 

X(I3) - 2*X(I2) 

◄-^^- 

X(II) +X(13)-2X (12) - 

X(I3) + 2*X(I4) 

◄- 

X(II) -X(13)-2X (14) - 

X(I3) + 2*X(I2) 

◄-^^- 

X(II) +X(13)+2X (12) - 

X(I1) ◄- 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 

} 

r 

Check bit reversing 

mode (on or off). 

} 

r 

BIT_REVERSING = 0, 

then OFF (no bit reversing). 

} 

BIT_REVERSING <> 0, 

then ON. 

} 

LDI 

@BIT_REVERSE, RO 


CMP I 

0,R0 


BZ 

MOVE_DATA 

r 

} 

Check bit reversing 

type. 

} 

r 

If SourceAddr = DestAddr, then in place bit reversing. 

} 

If SourceAddr <> DestAddr, then standard bit reversing. 


LDI 

@SOURCE_ADDR,RO 


CMP I 

@DEST_ADDR,R0 


BEQ 

IN_PLACE 

r 

} 

Bit reversing type 

1 (from source to destination). 

r 

} 

NOTE: abs(SOURCE_ADDR - DEST_ADDR) must be > FFT_SIZE, this is not checked. 


LDI 

@FFT_SIZE,RO 


SUBI 

2,R0 


LDI 

@FFT_SIZE,IRO 


LSH 

-1,IR0 ; IRO = half FFT size. 


LDI 

@ SOURCE_ADDR,ARO 


LDI 

@DEST_ADDR,ARl 


LDF 

*AR0++,R1 


RPTS 

RO 


LDF 

*AR0++, R1 


I I STF 

Rl,*AR1++(IRO) B 


STF 

Rl,*AR1 + +(IRO) B 


BR 

DIVISION 
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Fast Fourier Transforms (FFTs) 


Example 6-17. 

Real Inverse Radix-2 FFT (Continued) 

IN_PLACE: 

LDI 

} 

; In-place bit reversing. 

r 

; Bit reversing on even locations, 1st half 
; only. 

@FFT_SIZE, IRO 


LSH 

-2,IR0 ; IRO = quarter FFT size. 


LDI 

2, IRl 


LDI 

@FFT_SIZE,RC 


LSH 

-2,RC 


SUBI 

3,RC 


LDI 

@DEST_ADDR,ARO 


LDI A 

RO,ARl 


LDI A 

RO,AR2 


NOP 

*AR1++(IRO)B 


NOP 

*AR2++(IRO)B 


LDF 

*++AR0(IRl),R0 


LDF 

’^ARl, R1 


CMP I 

AR1,AR0 ; Xchange locations only if AR0<AR1. 


LDFGT 

RO, R1 


LDFGT 

*AR1++(IR0)B,R1 


RPTB 

BITRVl 


LDF 

*++AR0(IRl) ,R0 

I I 

STF 

RO,*AR0 


LDF 

’^ARl, R1 

I I 

STF 

Rl,*AR2++(IRO) B 


CMP I 

ARl,ARO 


LDFGT 

R0,R1 

BITRVl: 

LDFGT 

*AR1++(IRO)B,RO 


STF 

RO,*AR0 


STF 

Rl,^AR2 


LDI 

; Perform bit reversing on odd locations, 

; 2nd half only. 

@FFT_SIZE,RC 


LSH 

-1,RC 


LDI 

@DEST_ADDR,ARO 


ADD I 

RC,ARO 


ADD I 

1, ARO 


LDI 

ARO,ARl 


LDI 

ARO,AR2 


LSH 

-1,RC 


SUBI 

3,RC 


NOP 

*AR1++(IRO)B 


NOP 

*AR2++(IRO)B 


LDF 

*++AR0(IRl) ,R0 
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Example 6-17. Real Inverse Radix-2 FFT (Continued) 


LDF 

*AR1,R1 




CMP I 

ARl,ARO 

r 

Xchange locations only if AR0<AR1. 


LDFGT 

RO,Rl 




LDFGT 

*AR1++(IRO)B,R1 




RPTB 

BITRV2 




LDF 

*++ARO(IRl) , RO 



I I 

STF 

RO,*ARO 




LDF 

*AR1,R1 



I I 

STF 

Rl,*AR2++(IRQ) B 




CMP I 

ARl,ARO 




LDFGT 

RO,Rl 



BITRV2: 

LDFGT 

*ARl++(IRO)B,RO 




STF 

RO,*ARO 




STF 

Rl, ’^AR2 






r 

Perform bit reversing on odd 




} 

locations, 1st half only. 


LDI 

@FFT_SIZE,RC 




LSH 

-1,RC 




LDI 

RC,IRO 




LDI 

@DEST_ADDR,ARO 




LDI 

ARO,ARl 




ADD I 

1, ARO 




ADD I 

IRO,ARl 




LSH 

-1,RC 




LDI 

RC,IRO 




SUBI 

2,RC 




LDF 

*ARO,RO 




LDF 

*AR1,R1 




RPTB 

BITRV3 




LDF 

*++ARO(IRl) ,RO 



I I 

STF 

RO,*AR1++(IRO)B 



BITRV3: 

LDF 

*AR1,R1 



I I 

STF 

Rl,*-ARO(IRl) 




STF 

RO,*AR1 




STF 

Rl,*ARO 




BR 

DIVISION 






} 

f 

Check data source locations. 




r 

r 

If SourceAddr = 




} 

DestAddr, then do nothing. 




} 

If SourceAddr <> 




r 

} 

DestAddr, then move data. 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 

MOVE_DATA: 

LDI 

@SOURCE_ADDR,RO 




CMP I 

@DEST_ADDR,RO 




BEQD 

IVISION 




LDI 

@FFT_SIZE,RO 




SUBI 

2,R0 




LDI 

@ SOURCE_ADDR,ARO 




LDI 

@DEST_ADDR,ARl 




LDF 

*AR0++,R1 




RPTS 

RO 




LDE 

*AR0++,R1 



I I 

STF 

Rl, ’^ARl + T 




STF 

Rl,*AR1 



DIVISION: 

LDI 

2, IRO 




LDI 

@FFT_SIZE,RO 




FLOAT 

RO ; exp = LOG_SIZE 




PUSHF 

RO ; 32 MSB'S saved 




POP 

RO 




NEGI 

RO ; Neg exponent 




PUSH 

RO 




POPF 

RO ; RO = 1/FFT_SIZE 




LDI 

@DEST_ADDR,ARl 




LDI 

@DEST_ADDR,AR2 




NOP 

*AR2++ 




LDI 

@FFT_SIZE,RC 




LSH 

-1,RC 




SUBI 

2,RC 




MPYF3 

R0,*AR1,R1 ; 

1st location 



RPTB 

LAST_LOOP 




MPYF3 

R0,*AR2,R2 ; 

2nd,4th,6th,.., 

. location 

I I 

STF Rl, 

*AR1++(IRO) 



LAST_LOOP: 

MPYF3 

R0,*AR1,R1 ; 

3rd,5th,7th,.., 

. location 

I I 

STF 

R2, -^AR2++ (IRO) 




MPYF3 

R0,*AR2,R2 ; 

Last location 


I I 

STF 

Rl,*AR1 




STF 

R2,*AR2 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radlx-2 FFT (Continued) 




; Return to C environment. 

POP 

DP 

; Restore C environment variables 

POP 

AR7 


POP 

AR6 


POP 

AR5 


POP 

AR4 


POPF 

R7 


POP R7 



POPF 

R6 



POP R6 
POP R5 
POP R4 
POP FP 
RETS 

. end 


* No more. 


The ’C3x quickly executes FFT lengths up to 1024 points (complex) or 2048 
(real), covering most applications. It performs this task almost entirely in on- 
chip memory. See Table 6-2 on page 6-79 for the number of CPU clock cycles 
and the execution time required for FFT lengths between 64 and1024 points 
for the four algorithms. 
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TMS320C3X Benchmarks 


6.7 TMS320C3X Benchmarks 

Table 6-1 provides benchmarks for common DSP operations. Table 6-2 sum¬ 
marizes the FFT execution time required for FFT lengths between 64 and 1024 
points for the algorithms in Example 6-13, Example 6-15, Example 6-16, 
and Example 6-17 beginning on page 6-31. 

The benchmarks are given in clock cycles (the HI internal processor cycle). 
To get the benchmark (time), multiply the number of cycles by the processor’s 
internal clock period. For example, for a 60 MHz ’C3x, multiply by 33 ns. 

Table 6-1. TMS320C3x Application Benchmarks 


Application 

Words 

Cycles 

Inverse of a floating-point number 
(32-bit precision) 

31 

31 

Square root 

38 

46 

Double precision integer add/subtract 

2 

2 

Double precision integer multiply 

24 

24 

IEEE to ’C3x format conversion (fast) 

12 

9 

IEEE to ’C3x format conversion (complete) 

33 

19 

’C3x to IEEE format conversion (fast) 

14 

10 

’C3x to IEEE format conversion (complete) 

24 

27 

FIR filter 

5 

6-hN 

MR filter (one biquad) 

7 

7 

MR filter (N >1 biquads) 

16 

13-H6N 

LMS adaptive FIR filter 

11 

13-H3N 

Matrix-vector multiplication 

10 

2-h10K-hK(N-1) 

Vector dot product 

6 

N-h4 

Vector maximum 

5 

2-h3N 

Forward LPC lattic filter 

11 

5-h3P 

Inverse LPC lattice filter 

9 

6-h3P 

ILL-law (A-law) compression 

16(18) 

16(18) 

|Li-law (A-law) expansion 

13(15) 

16(21) 
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Table 6-2. TMS320C3x FFT Timing Benchmarks (Assumes Data On Chip and 
No Bit Reversing) 




Number of CPU Clock Cycles 


Number of 
Points 

Radix-2 

(Complex) 

Radix-4 

(Complex) 

Radix-2 

(Real) 

Radix-2 
(Real Inverse) 

64 

1481 

2050 

791 

1064 

128 

3445 

- 

1746 

2369 

256 

7865 

10400 

3925 

5282 

512 

17 709 

17 709 (’C31) 

42 210(’C32) 


8840 

11731 

1024 

39 600 (’C30) 

40 100 (’C31) 

94 519(’C32) 

50 670 

19 820 

25 900 

512 

25 688 (’C32) 




1024 

64 781 (’C32) 




2048 

11 611 (’C30) 
117 400 (’C31) 




4096 

280 800 (’C30) 
283 600 (’C31) 





These benchmarks include C overhead: they represent the number of cycles 
between the standard C-compiler _main and _exit labels. 

These benchmarks do not include the final bit-reversing stage. If bit-reversing 
is required, it is implemented in a serial fashion in off-chip memory. 
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6.8 Sliding FFT 


SFFT.ASM uses a technique known as a sliding FFT (SFFT) to calculate the 
spectrum of a signal on a sample-by-sample basis. The SFFT is particularly 
well-suited for applications where signal analysis, filtering, modulation, 
demodulation, or other forms of signal manipulation in the frequency domain 
must be performed in real time. The SFFT algorithm is similar to the discrete 
Fourier transform (DFT). The SFFT is equivalent to overlapped FFTs with an 
overlap of 1 sample, in that the past frequency data is reused to calculate the 
frequency spectra of the next sample window. The calculation is performed by 
adding the frequency domain spectra of a new sample, while simultaneously 
subtracting the frequency domain spectra of the oldest sample. The SFFT 
does not require first-hand knowledge of the DFT or FFT. In addition, the SFFT 
can be used to derive the DFT equation, which can be used by DSP beginners 
or by DSP experts looking for a different approach to solve a problem. 

6.8.1 SFFT Theory: A Better Way to Use the Impulse Response 

The SFFT is based on the following simple concepts: 

1) The property of superposition allows two or more signals to be added lin¬ 
early to create a new signal. A sampled time domain signal is the summa¬ 
tion of a series of individual input samples or impulses of varying magni¬ 
tude (Figure 6-1 Oa). Similarly, signals, or impulses, can be subtracted. 

If an input signal sample buffer (Figure 6-1 Oa) of data is kept in memory, a 
sliding rectangular window of data samples (Figure 6-1 Ob and 
Figure 6-1 Od) can be constructed by adding the newest sample and 
subtracting the oldest sample (Figure 6-1 Oc) from the previous original 
windowed signal (Figure 6-1 Ob). The following diagram shows how the 
addition and subtraction of samples can ’slide’ a window of data samples 
from those shown in Figure 6-1 Ob to those shown in Figure 6-1 Od. 
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Figure 6-10. Input Signal Sample Buffer 


Older 


Newer 


a) Input signal sample buffer 





r 






'f 


A 

r 





i 

r 


1 

1 






► * 





b) Original windowed signal 


• - 4 


T = 0 


c) New-old sample window 


Subtract old 
T = N 


T = 0 
Add new 



Note: T = time 


2) The frequency domain response of an impulse, or single sample point 
where all other data points are zero, results in a flat frequency response 
with a magnitude in each frequency bin equal to the impulse input magni¬ 
tude. Conversely, the impulse is the additive result of many sinusoidal fre¬ 
quency components. The time when the impulse occurs within the sample 
window is determined by the phase angles of the individual component 
frequencies. An impulse’s time of arrival is determined by a linear phase 
shift between each frequency bin. 
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Sliding FFT 


3) In the frequency domain, the addition of frequency samples also follows 
the rules of superposition. 

The spectra of Figure 6-1 Oc, the new-old sample window, is added to the 
spectra of Figure 6-1 Ob, the original windowed signal, to create the new 
spectra of Figure 6-1 Od. The difference is that complex data is used in the 
frequency domain to represent the phase information of the individual 
component frequencies. 

4) The summation of a series of simple impulse transforms, which have cor¬ 
respondingly simple frequency domain transforms, results in the compos¬ 
ite frequency domain transform of the signal. 

5) A sliding rectangular window is created by subtracting the Nth oldest sam¬ 
ple, which, in the frequency domain, will have gone through a multiple of 
2 X pi radian rotations. 

I-1 

Note: 

In some applications, complex time domain inputs may also useful. For this 
application, only the REAL data from an ADC is used. 

I_I 


6.8.2 Frequency Response Calculation 

If an impulse sample occurs at T = 0, the frequency response calculation is fur¬ 
ther simplified since the response contains only REAL and no IMAG compo¬ 
nents. The transform of an impulse at T = 0 is simply to store the magnitude 
of the impulse into each REAL bin, and zero the IMAG bin. 

If T != 0, the time shift creates a phase shift or complex vector rotation within 
each frequency bin. The phase rotation angle is proportional to the time shift 
and the frequency of interest. 

If the time shift is one sample period, as used in the SFFT, special conditions 
can be applied. At low frequencies, the amount of phase shift from sample to 
sample is low, or in the case of 0 Hz, zero radians of phase. At higher frequen¬ 
cies, the phase rotation is greatest. At the Nyquist frequency, the vector rota¬ 
tion is pi/2 radians per sample, which corresponds to 2 samples per sine wave 
cycle. Vector rotation for bins between DC and the Nyquist rate are proportion¬ 
al to the bin frequency. 
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A Fourier transform also produces both negative and positive frequencies, 
which are mirror images of each other. Only positive frequencies need to be 
computed. This is suitable for spectrum analysis and filtering. The ranges for 
n and the resulting complex rotation vectors (twiddle factors) for each bin are: 

Positive frequencies 0 <= n < N/2 

Negative frequencies -N/2 <= n < 0 

complex (R_phase, I_phase) = exp“i*2*pi*n/N 
REAL_tw[n] = cos(n*2*pi/N) 

IMAG_tw[n] = sin(n*2*pi/N) 


The basic SFFT operation is a vector rotate of each previous bin value; that 
is, add the newest sample and subtract the oldest sample. Although it is a sim¬ 
ple operation, all bins must be computed before the next input sample is ready. 

NewBinVal = (New - Old) + (OldBinval ^ vect_rotate) 

Bin[n] = (Sample [ 0 ]-Sample [N-1 ] ) + (Bin[n] exp“ 3 * 2 *pi*n/N) 


6.8.3 Visualizing the SFFT 

The easiest way to visualize the SFFT is to consider that each new sample 
occurs at T = 0, making each new sample all REAL in the frequency domain. 
Then, since the past summation is time-shifted by one sample, a vector rota¬ 
tion proportional to the frequency is applied. A schematic representation for 
an SFFT bin is shown in Figure 6-11. 


Figure 6-11. Frequency Bin Diagram (Equivalent to an HR Filter) 
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N delay 
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Where: Vector_rotation_rate[n-th Freq] = 2*PI * n / (N*Fs) 
K1 & K2 force convergence (see section 6.8.4) 


DSP Algorithms 


6-83 




































Sliding FFT 


6.8.4 Fbin Convergence and Stability 

One aspect of the SFFT is that there is a feedback loop which affects the stabil¬ 
ity of the bin values. This Is similar to an MR filter where, In the Z domain, a pole 
sites on the unit circle. To maintain stability and keep the bin values from grow¬ 
ing out of control, the magnitude of the complex vector rotation twiddles must 
be set to slightly less than 1, placing the pole inside the unit circle. This causes 
the impulse energy magnitude in each bin to decay exponentially towards 
zero. By adding a stability factor, by Nth bin rotation an impulse decays to K1N 
of its original magnitude. To subtract the Nth oldest sample, the Nth oldest 
sample is scaled by a second coefficient K2 = K1N. a side effect of the expo¬ 
nential decay is that the SFFT is now windowed by an exponentially decaying 
window. To minimize this effect, keep K1 close to 1.000 (0.999, for example). 

6.8.5 SFFT Windowing 

Unlike the FFT and DFT, SFFT windowing cannot be performed in the time do¬ 
main; the input window is moving in time and, therefore, the window function 
must also move in time. The SFFT windowing operation is performed in the 
frequency domain using a technique known as convolution. The desirable 
effect of windowing is a multiplicative process in the time domain whereby the 
sharp discontinuities at the endpoints, that accompany a rectangular data win¬ 
dow, are smoothed out. Without a smoothing window, these abrupt changes 
smear the frequency spectrum over many bins. In the frequency domain, the 
coefficients of most windowing functions are simple and do not require large 
storage arrays. For the raised cosine window function, the coefficients are par¬ 
ticularly simple (-.5, +1.0, -.5) and are easily imbedded into the code as addi¬ 
tion and subtraction. However, frequency domain (or convolutional window 
filtering) Is applied to the REAL and IMAG data separately before the REAL/ 
IMAG data is combined into a magnitude. The operation is fast and only occurs 
during output. Furthermore, other window functions are rapidly and easily 
implemented by selecting different convolution coefficients. 
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Figure 6-12. Raised Cosine Window 

Time domain Frequency domain 



6.8.6 Using SFFT.ASM for Spectrum Analysis 

If the SPECT_EN variable is set to 1 (true), the DSK anaiog output is config¬ 
ured to be the computed spectrum of the anaiog input beginning at 
BiN_START and ending at BiN END. The output is then viewed using an oscii- 
ioscope, which is triggered on a positive synch puise. The DAC output voitage 
is proportionai to the iog magnitude of each frequency bin. 

To heip pass impuises with minimai magnitude errors, each DAC output sam¬ 
ple can be repeated up to DAC_RPT times. Also, the AlC TA register value can 
be programmed to have a very high pass band. This increases the DAC output 
distortion, which is a problem if used for audio applications, but is acceptable 
for visual purposes. 

Also, the BIN_START and BIN_END values do not need to begin at zero or end 
at SFFTSIZE/2. This can be used to show that the frequency bins repeat in the 
frequency domain, as predicted by the discrete Fourier transform. The only 
restrictions are the availability memory and CPU processing power. 


6.8.7 Using SFFT.ASM for Hilbert Transforms and Arbitrary Phase Angles Filters 

If SPECT_EN is set to 0, the output is configured to be the summation of the 
reconstructed REAL and IMAG components. 

An arbitrary output phase angle is implemented by performing a complex mul¬ 
tiplication of the REAL and IMAG components by a complex vector determined 
by the ANGLE parameter. If ANGLE = 90°, the Hilbert transform is recon¬ 
structed from the pass-band SFFT bins covering BIN_START to BIN_END. If 
ANGLE = 0.0, no phase shift occurs. 
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The 0° and matched 90° phase shift Hilbert transform is useful in telecommu¬ 
nications applications, where the quadrature outputs are used to shift the 
spectrum of a signal or in radio and modem modulation schemes. 

6.8.8 Raised Cosine Windowed Fiiters 

By applying the raised cosine window to the summation of bin values, the 
REAL or IMAG filter response ripple is improved. 

The method implemented uses a series of coefficients that are applied to each 
frequency bin and then added much like an FIR filter, except in the frequency 
domain. 

The coefficient values result from both: 

□ The convolution of the response of a raised cosine function with the signal 
response 

□ The multiplication of a rectangular bandpass filter, also applied in the 
frequency domain 

A group delay, or time shift, is also seen which is equal to N/2 plus the time it 
takes a signal to make it through the ADC/DAC conversion process. 

In Figure 6-13 through Figure 6-16, the number of bins required is actually 
WIDTH + 2 for a given pass-band bandwidth and the signs of the coefficients 
alternate (-i-, -, +, -). The endpoints, which are also scaled by 50%, are the 
result of the window coefficients and define the edge characteristics of the 
filter. 

Figure 6-13. Raised Cosine Window Function (Length = 1 Bin) 
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Figure 6-14. Raised Cosine Window Function (Length = 2 Bins) 
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Figure 6-15. Raised Cosine Window Function (Length = 3 Bins) 
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Figure 6-16. Raised Cosine Window Function (Length = 4 Bins) 
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6.8.9 Non-Windowed SFFT 

A special case occurs when the SFFT is used to compute the all pass 0’ and 
90’ Hilbert transforms of a non-windowed synchronized signal. Frequency bin 
spreading occurs if the signal is not harmonically related to the sample window. 

For REAL summations, the input is reconstructed by scaling the 0 or DC bin 
by 50%. This scaling compensates for a 2:1 rise in signal level since all bin data 
energy, except for the 0 bin, is split equally between the positive and negative 
frequencies. 

At the 0 bin, there is no IMAG information, since no phase shift is applied to 
that bin. A DC component for an IMAG reconstruction, therefore, does not 
exist. 

Figure 6-17. N/2 SFFT R/l Bins 





0 bin 


6.8.10 Performance 


Since the SFFT needs only to compute the bins of interest within the span of 
one time sample, narrow band analysis or filtering is very efficient, even when 
the effective FFT size is very large. If large numbers of bins and/or high sam¬ 
pling rates are impractical for a single processor, a traditional block style FFT 
or filter may be more practical. 

For example, in a filter application, only a few frequency bins may be required; 
the unused bins are zero since they are not needed for reconstruction. The 
maximum sampling rate (or the number of bins that can be calculated) is 
shown in the following equation. 

Ts(min) = (SFFT_cycles_per_bin * bins + loop_overhead) * nS/cycle Ts(min) 

= (7 * N/2 + 52) 40 nS 
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I-1 

Note: 

The loop overhead value is the time consumed by interrupt routines, data for¬ 
matting, input, and output. SFFT.ASM is not highly optimized, since it is for 
educational purposes. 

The loop can be optimized by inlining the three major functions—Input, 
SFFT, and Output—to remove 3 calls and 3 returns (or 24 cycles) from the 
loop overhead. 

I_I 

6.8.11 Loop Unrolling for High Speed Fiitering 

The inner loop of the SFFT consumes 5 computational cycles, but executes 
in 6 cycles. The conflict occurs from adata bus bandwidth limitation and results 
from the STF||STF operation immediately preceding a double load of data for 
the MPYF3 instruction. 

This null cycle is filled by moving the filter summations within the loop. The 
summation can be done entirely within registers and requires no data path 
access. 

The +1, -1 convolutional filter coefficients for raised cosine windowing can be 
hard coded within the loop by performing subtractions that invert the sum each 
time it goes through the loop. This avoids fetching coefficients from the data 
bus. 

Overall, the forward and reverse SFFT are computed at 6-7 cycles per bin, 
depending on whether both REAL and IMAG outputs are required. The gener¬ 
al case educational example SFFT.ASM is slightly slower, while SFFT2.ASM 
which is written for filtering. 

6.8.12 Fitting the Code and Data into Memory 

If the effective desired SFFT/FFT size is 512 points, then only 256 positive fre¬ 
quencies need to be computed. With R/l twiddle and R/l SFFT data associated 
with each bin, 1024 words of memory are required. In addition, 512 words of 
input buffer data are needed. 

To maximize speed, the inner loop of the SFFT uses dual access on-chip 
memory to access data at the rate of two data moves per CPU cycle. To avoid 
program fetch conflicts, the SFFT code is loaded into the second on-chip 
SRAM block, which also holds the data buffer. 

If off-chip memory is available, excellent performance is achieved by placing 
as much SFFT bin data on-chip as possible. The input window sample buffer 
and code can be external since the main code loop easily fits inside the cache 
and the sample buffer is only accessed twice per SFFT cycle. 
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I-1 

Note: 

The SFFT only needs to calculate the difference of the input of the most 
recent and the oldest data sample one time. This value is reused for all bin 
calculations and is kept in a register. 

I_I 

If circular or bit-reversed data storage is used, the data and twiddle buffers are 
forced to 2^ word boundaries. In addition, the circular addressing registers are 
consumed. Since the overhead of checking and reloading the buffer pointers 
is minimal and allows non-2N sizes, explicit pointer testing is used in 
SFFT.ASM. 

6.8.13 Using This Code With’C’ 

To use the functions in this code with a high level language such as C, you must 
perform context save and restore operations at the beginning and end of each 
function. 

6.8.14 TLC32040 ADC and DAC Considerations 

The application file SFFT.ASM is written to use a TLC32040 analog interface 
chip (AlC) connected as used in a TMS320C31 DSP Starter Kit or DSK 
(TMDS3200031). Further documentation for the DSK is available in the DSK 
or by downloading from the Texas Instruments FTP site. 


Files 

Location 

Main TMS320 FTP mirror site 

C3x DSK files subdirectory 

ftp://ftp.ti.com/mirrors/tms320bbs 

ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles 


6.8.15 SFFT Summary 

□ A time signal is comprised of a series of samples. 

□ Each sample is an impulse. 

□ The time signal is a time summation of a series of impulses. 

□ The frequency spectra of a single impulse at T = 0 is trivial to calculate, 
since it is only a REAL component in each frequency bin whose magnitude 
is that of the impulse. 

□ The frequency spectra of a signal is the summation of the individual im¬ 
pulse responses. 
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□ A shift in time is a shift in phase (or phase rotate) in the frequency domain. 

□ Consider each new impulse as occurring at T = 0 and perform the time shift 
on the past summation of samples as a whole. 

□ At each bin, the amount of phase rotation or twiddle factor that is applied 
to each bin is proportional to the frequency of the bin. The phase shift is 
zero at DC (n = 0) and pi radians at Fnyq (n = N/2). 

□ After phase rotating each bin, simply add the new sample/impulse value. 
(Don’t forget to start with each bin magnitude as zero.) 

□ At this point, the Fourier transform is a forever expanding series in both 
the time and frequency domains. 

□ The Nth oldest sample is rotated n multiples of 2 x pi radians, making the 
Nth oldest sample completely REAL with no IMAG component. 

□ At N samples of age, phase rotation = N x (n x 2 x pi/N) = n x 2 x pi. 

□ A sliding rectangular window is created by subtracting the T = Nth oldest 
sample while adding the newest T = 0 sample. At T = N, each frequency 
bin has rotated N times and is back to 0 radians of phase and can be prop¬ 
erly subtracted. 

6.8.16 SFFT Algorithm 

SFFTASM (Example 6-18 on page 6-94) is written for the DSP beginner, but 
contains features that also make it useful to the experienced DSP program¬ 
mer. SFFTASM implements a continuous time Fourier transform which can 
be used to construct filters and analyze spectra. It can also be used as a gener¬ 
al-purpose DSP teaching platform. 

SFFTASM uses a technique known as a sliding FFT (SFFT) to efficiently cal¬ 
culate the spectrum of a signal on a sample-by-sample basis. The SFFT is par¬ 
ticularly well-suited for applications where signal analysis, filtering, modula¬ 
tion, demodulation, or other forms of signal manipulation in the frequency 
domain must be performed in real time. The SFFT algorithm is similar to the 
DFT. 

Further reading and other information includes: 

□ Designer Notebook page 22 ’Fast Logrithms on a Floating Point Device’ 

□ APPHELP1 .TXT and APPHELP2.TXT included with the DSK software 
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□ Texas Instruments’ FTP site: 


Files 

Location 

Main TMS320 FTP mirror site 

C3x DSK files subdirectory 

TMS320C3X code examples 

TMS320C4X code examples 

ftp://ftp.ti.com/mirrors/tms320bbs 

ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles 

ftp://ftp.ti.com/mirrors/tms320bbs/c3xfiles 

ftp://ftp.ti.com/mirrors/tms320bbs/c4xfiles 


The following section sets the SFFT parameters which determine the SFFT 
output characteristics. The following rules apply: 

□ BIN_LEN = BIN_END - BIN-START > 0 

□ ((SFFTBINS X 4) + SFFTSIZE) < Free data space 

□ Sampling period < time to compute all bins 

Be careful not to set the sampling rate too high while calculating many bin 
values. The SFFT must finish calculating all of its bin values within the time 
span of one sample. 

The effective Fourier series size is determined by the size of the time window 
of samples. Although this does not affect the calculation rate, it does consume 
internal memory. 

Creating a pass band around a particular signal is easy, since the signal can 
be viewed either in frequency or time by changing the setting of SPECT_EN. 
With practice, you can you can zoom in on particular segments of frequency 
by changing the start and stop bins, window size, and sampling rate. 

The DAC output signal fidelity is largely determined by the TA register value 
that is programmed into the AlC. No one value seems to fit all applications. 
However, the following rules generally apply. If TA is small, the DAC recon¬ 
struction filter is clocked at a faster rate. This pushes the upper pass-band limit 
higher in frequency, resulting in faster slew times. This is desireable for a spec¬ 
trum analyzer output where fast impulse response to frequency peaks are 
needed for suitable viewing. For audio applications, a larger TA value is 
desired, since the overclocking of the DAC reconstruction filter results in signif¬ 
icant distortions. 

The AlC master clock input is derived from the timer output pin of internal timer 
0. If the timer reference is set higher than the TLC32040 maximum clock rate 
of 10 MHz, additional distortion occurs. 

A TLC32040 analog interface circuit is used on the DSK since it responds 
favorably when used beyond its tested limits. However, predicting perfor¬ 
mance depends on many factors; experimentation may be required. 
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AlC setup registers are programmed into the AlC using a data word which is 
tagged with xxxxt 1 b in the bottom 2 LSBs to signal the AlC to accept a secon¬ 
dary transmit (or register program) word. 

The DAC switch cap filter rate high is set by the TA divisor. A low TA value, used 
to overclock the DAC reconstruction filter, trades signal fidelity for faster 
impulse response times. 

This application was designed and tested using a 50 MHz TMS320C31 DSP 
Starter Kit (TMDS3200031) which includes a TLC32040 14-bit ADC/DAC. 
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Example 6- 

18. SFFT.ASM 




r 

; SFFT2.ASM 




; Keith Larson 




; TMS320 DSP Applications 



; (C) Copyright 

1996,1997,1998 



; Texas Instruments Incorporated 


r 

; This is 

unsupported freeware 

with 

no implied warranties or 

; liabilities. 

See the C3x DSK disclaimer document for details 

r 

; Default 

setup 




r 

; SPECT_ 

_EN = 

1 



; Fs 


20.8 khz (4.8 

uS) 


; Hz/bin = 

40.7 hz 



; Range 


1.3 Khz -3.9 Khz 


f 

; If this 

file . 

is re-assembled 

with 

SPECT_EN set to 0, this will give a 

; bandpass filter from 1.3-3 

. 9 Khz 

having 90 degrees phase shift at all 

; frequencies. 




r 

SFFTSIZE 

. set 

512 


; Sample Window length (EET size) 

BIN_START 

. set 

32 


; Start computing SEET at this bin 

BIN_END 

. set 

96 


; End computing SEET at this bin 

f 

ANGLE 

. set 

90.0 


; Eilter reconstruction angle (degrees) 

} 

SPECT_EN 

. set 

1 


; Enable spectrum analyzer output 

RATE 

. set 

2 


; Write display points RATE times each 

f 

TIM0_prd 

. set 

2 


; AIC reference clock is TIMO 

TA 

. set 

6 


; DAC setup 

TB 

. set 

25 


} 

RA 

. set 

10 


; ADC setup 

RB 

. set 

15 


} 

r 

; PARAMETERS BELOW THIS LINE ARE COMPUTED EROM THE INEORMATION 

; ABOVE. 

THERE 

IS NO NEED TO MODIEY 

ANYTHING BELOW THIS POINT 

} 

BIN_LEN 

. set 

BIN_END-BIN 

_START 

; Eilter length in bins 

SEETBINS 

. set 

BIN_LEN+1 


f 

N 

. set 

SEETSIZE 


; 'N' used as shorthand for SEETSIZE 

TR 

. set 

0 


; Real twiddle offset in each cell 

TI 

. set 

1 


; Imag 

DR 

. set 

0 


; Real data offset in each cell 

DI 

. set 

1 


; Imag 

RIBINSIZE 

. set 

2 


; Size of R/I element pair 

pi 

. set 

3.14159265 


; Useful in making apple pie 

w 

. set 

2.0*pi/N 


; angle = F * 2*pi/Fs 

OVM 

. set 

0x80 


; Use overflow mode to saturate results 
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Example 6-18. SFFTASM (Continued) 



r 

; If the input parameters won't work. 

generate a descriptive error 

; for the 

user letting them know what 

to look for and maybe fix 

} 

.if 

(BIN_LEN 

< 1) 



APP MESSAGE: Calculated BIN_LEN must 

be 

>1 

.endif 




.if 

( (SFFTBINS’^4) + SFFTSIZE) 

> 

(0xE40-0x800) 

APP MESSAGE: The Fbin and data storage 

buffers are too big for the DSK 

.endif 




r 

; The SEET 

twiddles, data, and input 

r 

buffer arrays are allocated ; 

; to be placed into RAMO to avoid bus 

conflicts with program fetching; 

f 

.include 

"C3XMMRS.ASM" 

r 

r 


.start 

"DATA",0x809800 

r 

Data arrays are placed at start of RAMO 

TWIDCOEE 

. sect 

"DATA" 

} 


f 


r 


n 

. set 

BIN_START 

} 



. loop 

SFFTBINS 

r 

R/I phase or twiddle coefficients 


.float 

Kl^cos(n*w) 

} 



.float 

Kl^sin(n^w) 

f 


n 

SEETDATA 

. sdef 
.endloop 

n+1.0 

f 

} 

next 'n' 

} 


} 



. loop 

SFFTBINS 

} 

R/I frequency bin data 


.float 

0, 0 

r 

Pre-Zeroing bin data removes 

BUE 

.endloop 


} 

startup glitches 

r 

. loop 

N/2 

f 

r 

N samples of ADC input delay data 



.float 

0, 0 

r 



.endloop 


} 


r 

; The application ( 

r 

code begins here, beginning with constants that ; 

; are used 

in various routines. 


f 

r 

Tbase 

. word 

TWIDCOEE 

f 

r 

Location of twiddle coefficients 

Bbase 

. word 

SEETDATA 

} 

Location of R/I SEET Bin data 

CircAddr 

. word 

BUE 

} 

Current pointer into sample data 

BUESTART 

. word 

BUE 

r 

Start address of sample data 

BUFEND 

. word 

BUF+N 

r 

End address of sample data 

OutBin 

.float 

0 

r 

Current spectrum analyzer bin 

MAX 

.float 

32000.0 

r 

Used synch pulse and scaling 

A_REG 

f 

. word 

(TA<<9) + (RA<<2) +0 

} 

Packed AIC register values 

B_REG 

. word 

(TB<<9)+(RB<<2)+2 

r 


C_REG 

. word 

00000011b 

} 


;Ogctrl 

. word 

0x0E970300 

r 

Sport setup, noninverted clkx/clkr 

SOgctrl 

. word 

0x0E973300 

r 

Sport setup, inverted clkx/clkr 

SOxctrl 

. word 

0x00000111 

r 
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Example 6-18. SFFT.ASM (Continued) 


SOrctrl 

. word 

0x00000111 

r 


NewMnsOld 

. word 

0 

} 


K1 

. set 

0.99995 

f 

Use a value slightly less than 1.0 

K2 

.float 

pow(Kl,N) 

} 

Kl^N oldest sample scale factor 

FILTEROUT 

.float 

0.0 

f 

Temp storage for SFFT filter output 

Scale 

.float 

4.0/N 

} 

SFFT growth scale factor 

REAL_VEC 

.float 

-cos(pi*ANGLE/180.0) 

r 

filtered REAL scale factor 

IMAG_VEC 

.float 

-sin(pi*ANGLE/180.0) 

} 

filtered IMAG scale factor 

FLOG2SC 

.float 

pow(2.0,-2 4.0) 

f 

Scale factor for log2 calculations 

bigval 

. word 

0x00010000 

r 

Used in overflow mode saturation 

r 

} The main loop consists of waiting for 

} 

a new ADC sample. ; 

; When an 

receive 

interrupt occurs, the 

new data is loaded into the ; 

; data delay line 

buffer, followed by 

the SFFT and output routines. ; 

; Four dummy writes to the external bus 

have been added in the main ; 

; loop to 

allow real time benchmarking 

of the three functions using ; 

; and oscilloscope 

to monitor the address bus LSB's ; 

f 

.start 

"CODE",0x809E40 

} 

r 

Start in last 512 words of RAMO 


. sect 

"CODE" 

f 

(also includes DSK kernel) 

main 

Idi 

0xE4,IE 

r 

Enable XINT/RINT/1NT2 


idle 


r 

Wait for Receive Interrupt 


r 

Idi 

@S0_rdata,RO 

} 

The first interrupt occurs shortly 


idi 

0,R0 

} 

after AlC init is complete, which 


sti 

RO,@S0_xdata 

} 

will not leave enough time for SEET 

loop 

r 

idle 


} 

Wait for Receive Interrupt 


sti 

RO,@0x80A000 

r 

<1 


call 

input 

} 

Put ADC sample in delay buffer 


sti 

RO,@0x80AE03 

r 

<2 


call 

SEET 

} 

Calculate SEFT 


sti 

RO,@0x80AE0E 

f 

<3 


call 

Output 

} 

Output result 


sti 

RO,@0x80AE3E 

r 

<4 


b 

loop 

f 

Loop back and do forever 

r 

; The ADC 

data is 

r 

read and buffered here ; 

} 

input 

Idi 

@S0_rdata,R0 

f 

f 

get ADC data 


ash 

-16,RO 

} 

Sign extend previous sample in MSB's 


float 

R0,R0 

r 

Convert the ADC data to float 


Idi 

OCircAddr,ARO 

} 

Load present circ buf address 


Idf 

*AR0,R7 

f 

Multiply by 'K2' for bin stability 


mpyf 

@K2,R7 

} 

(see text) 


st f 

RO,*AR0++ 

r 



cmpi 

OBUEEND,ARO 

} 

if at end of buffer, point to start 


Idige 

OBUFSTART,ARO 

} 



subrf 

R0,R7 

r 

R7 = X[-N] - X[0] 


sti 

ARO,OCircAddr 

r 

save new 'circular' modified ptr 


st f 

R7,ONewMnsOld 

} 



rets 


} 



6-96 




























Sliding FFT 


Example 6-18. SFFTASM (Continued) 


; The forward and reverse SFFT are calculated within this one loop 
; The loop itself is unrolled to achieve an inner loop cycle count 
; of 7 cycles per bin calculation. The inner loop contains both the 
; REAL and IMAG filter summations, so if the output is for spectrum 
; analysis or only one filter sum is required, one or both summations, 
; can be removed giving an inner loop speed of 6 cycles/bin 


r r 


SFFT 

Idi 

@Tbase,AR0 ; 

R/I 

twiddle ptr 


Idi 

@Bbase,ARl ; 

R/I 

SFFT array ptr 


idi 

@Bbase,AR2 

SFFT 

output (usualy in place) 


idi 

SEFTBINS-1, RC ; 

Number of bins to calculate 


idi 

RIBINSIZE,IRO ; 

Size 

of R/I pair in array 


Idf 

@NewMns01d,R7 ; 

R7 = 

(New - K2*01d) 


f 

idf 

0,R4 

Zero 

the REAL filter sum 


idf 

0,R5 ; 

Zero 

the IMAG filter sum 


} 

mpyf 3 

*+AR0(TR),*+ARl(DR) 

,R0 ; 

TR^DR <- unroll from main loop 


rptb 

EndSEFT 

} 


Loop 

f 

mpyf 3 

*+AR0 (TR) , ’^TARl (DI) 

,R1 ; 

TR^DI 


mpyf 3 

*+AR0(TI) ,*+ARl(DI) 

,R0 ; 

TI^DI 

I I 

addf 3 

R7, RO 

,R3 ; 

(TR^DR + DELTA) 


mpyf 3 

*+AR0(TI) ,*+ARl(DR) 

,R0 ; 

TI^DR 

I I 

subf 3 

R0,R3 

,R3 ; 

TR^DR - TI’^DI + DELTA 


mpyf 3 

*++AR0(IRO),*++ARl(IRO) 

,R0 ; 

TR^DR (used in next loop) 

I I 

addf3 

R1,R0 

,R2 ; 

TR^DI + TI^DR 


St f 

R2,*+AR2(DI) 

} 

Save the new Fbin values 

I I 

st f 

R3,*AR2++(IRO) 

f 



} 

subf 3 

R4,R3,R4 ;REAL 

sum; 

sum'=R-sum alternates sign of 

EndSFFT 

subf 3 

R5,R2,R5 ;IMAG 

sum; 

raised cosine window coeficient: 

r 

r 

Eor raised cosine window filters the endpoint bin values 

r 

are scaled to 1/2 relative to 

the pass bins 

} 

addf 

R4,R4 ; 

Double inner +/-1 sum loop 


addf 

R5,R5 ; 




subf 

R3,R4 ; 

Subtract endpoints at 50% 


subf 

R2,R5 ; 




idi 

@Bbase,ARl ; 

ptr 

to start of R/I SEFT array 


Idf 

*+ARl(DI),R2 ; 



I I 

idf 

*+ARl(DR),R3 ; 




.if 

SFFTBINS&l ; 

If the loop count was odd, the 


mpyf 

-1,R4 

+ A “a 

+,- sum result is negative 


mpyf 

-1,R5 ; 




.endif ; 




addf 

R3,R4 ; 




addf 

R2,R5 ; 
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Example 6-18. SFFT.ASM (Continued) 


r 

; When 

the SFFT is finished, the RFAL/IMAG sums are scaled 

; accordingly for the desired output phase angle. A 'growth' 

; scale factor is also applied since the summation occurs 

; over 

N data points. 


f 

ExitSFFT mpyf 

@RFAL_VFC,R4 

; Rotate to desired output phase 

mpyf 

@IMAG_VFC,R5 

f 

addf 3 

R4, R5,RO 

; Sum the R/I into a REAL output 

mpyf 

@Scale,R0 

; inverse of N/2 growth 

St f 

RO,@FILTFROUT 

} 

rets 


} 

! r 

; The output section is written for both Spectrum analyzer output ; 

; as well as RFAL/IMAG filter sum 

outputs ; 

r 

Output: .if 

SPFCT_FN=0 

} 

; If SPECT_EN=0 (disable) output either 

Idf 

@FILTFROUT,RO 

; Output RFAL/IMAG bin sum 

. else 


f 

} 

; The 

Spectrum analyzer 

output section is bypassed 

; if the spectrum analyzer is not enabled 

r 

idf 

@OutBin,RO 

; Point to next output bin 

addf 

1.0/RATE,RO 

; increment analyzer output pointer 

cmpf 

BIN_LEN,RO 

} 

Idfge 

0,R0 

f 

st f 

RO,OOutBin 

f 

fix 

R0,R0 

} 

bzd 

Out 

r 

mpyi 

RIBINSIZE,RO 

; Fbins are 2 words (R/I) per bin 

Idfz 

@MAX,R0 

; If at base Fbin 0 Hz, output a synch 

Idi 

OBbase,ARO 

} 

subi 

2, ARO 

; point to output bin-1 to perform 

add! 

RO,ARO 

; -.5,1.0,-.5 convolutional window 

f 

idf 

*+AR0(DI+0),RO 

; Perform convolutional window filter 

I I idf 

*+AR0(DR+0),R2 

; on the R/I pairs for this output 

addf 

*+AR0(DI+4),RO 

} 

addf 

*+AR0(DR+4),R2 

} 

mpyf 

-0.5,R0 

; Scaling coefficient for -1,+1 bins 

mpyf 

-0.5,R2 

r 

addf 

*+AR0(DI+2),RO 

} 

addf 

r 

*+AR0(DR+2),R2 

} 
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Example 6-18. SFFT.ASM (Continued) 




mpyf 

o 

ft 

o 

ft 

} 

Calculate REAL^2 + iMAG^2 magnitude 



mpyf 

R2,R2 

f 




addf 

R2,R0 

f 




call 

FL0G2 

} 

Convert to log2(), then scale 



mpyf 

32,RO 

r 

and shift for best display 



mpyf 

32,RO 

} 




subf 

@MAX,R0 

f 




r 

.endif 


f 


Out 

fix 

R0,R0 

} 

Convert to integer DAC output 



mpyi 

Obigval,RO 

f 

Use Overflow mode ALU saturation 



ash 

-16,RO 

f 




andn 

3,R0 

f 

Do not request a 2nd xmit 



sti 

RO,@S0_xdata 

} 

Output DAC value to serial port 



rets 


} 


r 

} 

FLOG2 0 

Ultra Fast LOG2 function 


r 

} 

r 

computes log2(R0) and returns e8/sl/m4 accuracy float value in RO ; 

r 

FLOG2: 

cmpf 

1 o 

1 ft 

1 o 

1 o 

r 

r 

Exit if value is <= Zero 



Idfle 

-1,R0 

} 

if x<=0 return -1 (error) 



retsle 


r 

return if X<=0 



Ish 

1,R0 

} 

Concatenate mantissa to exponent 



pushf 

RO 

f 

Convert 'fast log' to int, then float 



pop 

RO 

} 

Value is accurate but scaled by 2^24 



float 

R0,R0 

f 




mpyf 

@FLOG2SC,RO 

} 

Mpy by scale factor 



rets 


} 


r 

} 

The startup stub is used during 

r 

initialization only and can be ; 

r 

overwritten by 

the stack or data 

after initialization is complete. ; 

} 

Note: A 

DSK or 

RTOS communications 

kernel may also use the stack. ; 

f 

in this 

case be 

: sure to not put 

the 

stack here during debug. ; 

} 


.entry 

ST_STUB 

f 

r 

Debugger starts here 

ST_STUB 

Idp 

T0_ctrl 

} 

Use kernel data page and stack 



Idi 

Ostack,SP 

f 




Idi 

0,R0 

} 

Halt TiMO & TiMl 



sti 

RO,@T0_ctrl 

r 




sti 

RO, @T0_count 

} 

Set counts to 0 



idi 

TiM0_prd,RO 

} 

Set period 



sti 

RO,@T0_prd 

r 




idi 

0x2Cl,R0 

r 

Restart both timers 



sti 

} 

RO,@T0_ctrl 

r 
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Example 6- 

-18. SFFT.ASM (Continued) 





Idi 

@S0xctrl,R0 

r 





sti 

RO , @S0_xctrl 

} 

transmit control 




Idi 

@S0rctrl,RO 

f 





sti 

RO f @S0_rctrl 

f 

receive control 




Idi 

0,R0 

} 





sti 

RO, @S0_xdata 

f 

DXR data value 




Idi 

OSOgctrl,RO 

} 

Setup serial port 




sti 

RO,@S0_gctrl 

r 

global control 


r 

r 

This section of code initializes 

the AIC 

} 

r 

r 

AIC_INIT 

LDI 

0x10,IE 

f 

Enable only XINT interrupt 

} 



andn 

0x34,IF 

} 





Idi 

0,R0 

r 





sti 

RO,@S0_xdata 

} 





RPTS 

0x040 

f 





LDI 

2, lOF 

f 

XF0=0 resets AIC 




rpt s 

0x40 

} 





LDI 

6, lOF 

} 

XF0=1 runs AIC 




Idi 

@C_REG,R0 

r 

Setup control register 




call 

prog_AIC 

} 





Idi 

Oxfffc ,RO 

r 

Program the AIC to be real slow 



call 

prog_AIC 

} 





Idi 

Oxfffc12,R0 

f 





call 

prog_AIC 

} 





Idi 

@B_REG,R0 

f 

Bump up the Fs to final rate 




call 

prog_AIC 

} 

(smaller divisors should be sent last) 



Idi 

@A_REG,R0 

f 





call 

prog_AIC 

f 





or 

OVM,ST 

f 

Use the overflow mode for fast 

saturate 



b 

main 

} 

the DRR before going to the main loop 

} 

f 

prog_AIC is used to transmit new 

timing configurations to the AIC. 

r 

f 

} 

If you 

single i 

step this routine. 

the AIC timing will be corrupted 

} 

f 

causing 

AIC programming to fail. 



f 

} 

STEP OVER THIS 

ROUTINE USING THE 

FIO FUNCTION STEP 

} 

r 

prog_AIC 

Idi 

@S0_xdata,Rl 

r 

Use original DXR data during 2 

r 

ndy 



sti 

Rl,@S0_xdata 

r 





idle 


} 





Idi 

@S0_xdata,Rl 

f 

Use original DXR data during 2 

ndy 



or 

3,R1 

} 

Request 2 ndy XMIT 




sti 

Rl,@S0_xdata 

r 





idle 


} 





sti 

RO,@S0_xdata 

} 

Send register value 




idle 


f 





andn 

3,R1 

} 





sti 

Rl, @S0_xdata 

r 

Leave with original safe value 

in DXR 



r 

Idi 

@S0_rdata,R0 

r 

Fix receiver underrun by dummy 

read 



rets 


} 
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Example 6-18. SFFTASM (Continued) 








r 

r 

By placing the 

stack at the end 

f 

of the users runtime code, the ; 

r 

maximum space 

is made available 

for applications. 

Essentialy 

once ; 

} 

used initialization code or data 

can be reclaimed 

after it is 

used.; 

r 

However, use this configuration 

for debug purposes 


f 

f 

.start 

"STACK",$ 

; This is a reminder to put 

f 

the stack 


. sect 

"STACK" 

; stack in a safe place. $ 

places 

stack .word 

stack 

; section at the 

current assy address 

r 

} 

install the XiNT/RiNT iSR branch 

vectors 


f 

} 

r 

.start "SPOVECTS",0x809FC5; 

Place iSR returns 

f 

directly into 


.sect " 

SPOVECTS" 

; secondary branch 

table 



reti 


; XiNTO 




ret i 


; RiNTO 
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Chapter 7 


Programming the DMA Channel 


The direct memory access (DMA) coprocessor is an on-chip peripheral that 
can read from or write to any location in the memory map without interfering 
with the CPU operation. The DMA channel contains its own address genera¬ 
tors, source and destination registers, and transfer counters. The DMA chan¬ 
nel can be easily programmed in C or in assembly language. 

The ’C30 and ’C31 coprocessors each have one DMA channel, while the ’C32 
coprocessor has two DMA channels. Each channel of the ’C32 DMA channel 
is similar to those of the ’C30 and ’C31, with the addition of user-configurable 
priorities. 

This chapter provides examples for programming the DMA for the ’C3x. 


Topic Page 

7.1 Hints for DMA Programming.7-2 

7.2 When a DMA Channei Finishes a Transfer.7-3 

7.3 DMA Assembiy Programming Exampies.7-4 
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Hints for DMA Programming 


7.1 Hints for DMA Programming 

The Peripherals chapter of the TMS320C3x User’s Guide describes the DMA 
channel and its operation in detail. Use the following techniques to program 
your DMA more efficiently and to avoid unexpected results: 

□ Reset the DMA register before starting it. This clears any previously latched 
interrupts that may no longer exist. 

□ After starting the DMA, set the IE register to enable interrupts for sync 
transfer. 

□ If a conflict occurs when the CPU and DMA access the memory simulta¬ 
neously on the ’C30 or ’C31, the CPU always prevails. Carefully allocate 
the sections of the program in memory for faster execution. If a CPU pro¬ 
gram access conflicts with a DMA access, enabling the cache helps if the 
program is located in external memory. DMA on-chip access happens dur¬ 
ing the H3 phase. Refer to the Pipeline Operation chapter in the 
TMS320C3X User’s Guide tor details on CPU accesses. 

If a conflict occurs during CPU-DMA access on the ’C32, the priority set 
between the CPU and DMA is used to arbitrate conflicts. If the DMA chan¬ 
nel has lower priority than the CPU, the DMA may fail to finish a block 
transfer if conflicts occur. To avoid this condition, use CPU/DMA rotating 
priority in the corresponding DMA control register. 

I-1 

Note: Expansion and Peripheral Buses 

The expansion and peripheral buses on the ’C30 cannot be accessed simul¬ 
taneously because they are multiplexed into a common port. Therefore, 
DMA access to the peripheral bus along with CPU access to the expansion 
bus can cause CPU-DMA conflicts. (See the TMS320C3x User’s Guide tor 
more information.) 

I_I 

□ When you use interrupt synchronization, ensure that interrupts are actual¬ 
ly generated; otherwise, the DMA will never complete the block transfer. 

□ Use read/write synchronization when reading from or writing to serial ports 
to guarantee data validity. 
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When a DMA Channel Finishes a Transfer 


7.2 When a DMA Channel Finishes a Transfer 

Many applications require that you perform certain tasks after a DMA channel 
has finished a block transfer. The following are indications that the DMA has 
finished a set of transfers: 

□ The DINT bit in the IIF register is set to 1 (interrupt poiiing). This re¬ 
quires that the TCINT bit in the DMA control register be set first. This inter¬ 
rupt-polling method does not cause any additional conflict during CPU- 
DMA access. 

□ The transfer counter has a zero value. The transfer counter is decrem¬ 
ented after the DMA read operation finishes (not after the write operation). 
Nevertheless, a transfer counter with a zero value can be used as an In¬ 
dication of a transfer completion. 

□ The STAT bits in the DMA channel control register are set to OO 2 . You 

can poll the DMA channel-control register for this value. However, 
because the DMA registers are memory-mapped into the peripheral bus 
address space, this option can cause further conflicts during CPU-DMA 
access. 
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DMA Assembly Programming Examples 


7.3 DMA Assembly Programming Examples 

Example 7-1, Example 7-2, and Example 7-3 illustrate how to program the 
DMA channel using assembly language. 

When linking the examples, allocate section memory addresses carefully to 
avoid CPU-DMA conflict. In the ’C30 or ’C31, the CPU always prevails in cases 
of conflict. If a conflict occurs between a CPU program and DMA data, you can 
enable the cache if the .text section is in external memory. For example, when 
linking the code in Example 7-1, Example 7-2, and Example 7-3, allocate the 
following sections into memory (RAMO corresponds to on-chip RAM block 0 and 
RAMI corresponds to on-chip RAM block 1): 

□ .text section into RAMO 

□ .data section into RAMI 

□ .bss section into RAMI 

Example 7-1. Array Initialization With DMA 


* TITLE: ARRAY INITIALIZATION 

■Jr 

WITH 

DMA 



GLOBAL START 





DATA 



DMA 


WORD 808000H 

r 

DMA GLOBAL CONTROL REG ADDRESS 

RESET 


WORD 0C40H 

} 

DMA GLOBAL CONTROL REG RESET VALUE 

CONTROL 

WORD 0C43H 

r 

DMA GLOBAL CONTROL REG INITIALIZATION 

SOURCE 


WORD ZERO 

} 

DATA SOURCE ADDRESS 

DESTIN 


WORD _ARRAY 

r 

DATA DESTINATION ADDRESS 

COUNT 


WORD 12 8 

} 

NUMBER OE WORDS TO TRANSEER 

ZERO 


ELOAT 0.0 

r 

ARRAY INITIALIZATION VALUE 0.0 = 0X80000000 



BSS _ARRAY,128 

r 

DATA ARRAY LOCATED IN .BSS SECTION 



TEXT 



START 

LDP 

DMA 

} 

LOAD DATA PAGE POINTER 


LDI 

@DMA,ARO 

r 

POINT TO DMA GLOBAL CONTROL REGISTER 


LDI 

ORESET,RO 

} 

RESET DMA 


STI 

RO,*AR0 




LDI 

@SOURCE,RO 

} 

INITIALIZE DMA SOURCE ADDRESS REGISTER 


STI 

RO,*+AR0(4) 




LDI 

ODESTIN,RO 

r 

INITIALIZE DMA DESTINATION ADDRESS REGISTER 


STI 

RO,*+AR0(6) 




LDI 

@COUNT,RO 

} 

INITIALIZE DMA TRANSEER COUNTER REGISTER 


STI 

RO,*+AR0(8) 




OR 

400H,IE 

r 

ENABLE INTERRUPT EROM DATA TO CPU 


OR 

2000H,ST 

r 

ENABLE CPU INTERRUPTS GLOBALLY 


LDI 

OCONTROL,RO 

} 

INITIALIZE DMA GLOBAL CONTROL REGISTER 


BU 

$ 




.END 
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DMA Assembly Programming Examples 


In Example 7-1, the DMA initializes a 128-element array to 0. The DMA sends 
an interrupt to the CPU after the transfer is completed. This program assumes 
previous initialization of the CPU interrupt vector table (specifically the DMA-to- 
CPU interrupt). The ST and IE registers are initialized for interrupt processing. 

In Example 7-2, the serial port 0 is initialized to receive 32-bit data words with 
an internally generated receive-bit clock and a bit-transfer rate of 
8H1 cycles/bit. 

This program assumes previous initialization of the CPU interrupt vector table 
(specifically the DMA-to-CPU interrupt). The serial-port interrupt directly affects 
only the DMA; therefore, no CPU serial-port interrupt vector setting is required. 
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Example 7-2. DMA Transfer With Serial-Port Receive Interrupt 


* TITLE DMA TRANSFER WITH SERIAL PORT RECEIVE INTERRUPT 



.GLOBAL START 




.DATA 




DMA 

.WORD 

808000H 

} 

DMA GLOBAL CONTROL REG ADDRESS 

CONTROL 

.WORD 

0D43H 

r 

DMA GLOBAL CONTROL REG INITIALIZATION 

SOURCE 

.WORD 

80804CH 

} 

DATA SOURCE ADDRESS: SERIAL PORT INPUT 

DESTIN 

.WORD 

_ARRAY 

r 

DATA DESTINATION ADDRESS 

COUNT 

.WORD 

128 

} 

NUMBER OE WORDS TO TRANSEER 

lEVAL 

.WORD 

002000400H 

r 

IE REGISTER VALUE 

RESETl 

.WORD 

0D40H 

r 

DMA RESET 


.BSS 

_ARRAY,128 

} 

DATA ARRAY LOCATED IN .BSS SECTION 




r 

THE UNDERSCORE USED IS JUST TO MAKE IT 




r 

ACCESSIBLE FROM C (OPTIONAL) 

START 

LDP 

DMA 

r 

LOAD DATA PAGE POINTER 


* DMA INITIALIZATION 


LDI 

@DMA,ARO 

r 

POINT TO DMA GLOBAL CONTROL REGISTER 

LDI 

@SPORT,ARl 



LDI 

@RESET,R0 



STI 

RO,*+ARl(4) 

} 

RESET SPORT TIMER 

LDI 

@RESET1,R0 



STI 

RO,*AR0 

} 

RESET DMA 

LDI 

@SPRESET,RO 



STI 

RO,*AR1 

} 

RESET SPORT 

LDI 

@SOURCE,RO 

r 

INITIALIZE DMA SOURCE ADDRESS REGISTER 

STI 

RO,*+AR0(4) 



LDI 

@DESTIN,R0 

r 

INITIALIZE DMA DESTINATION ADDRESS REGISTER 

STI 

RO,*+AR0(6) 



LDI 

@COUNT,RO 

} 

INITIALIZE DMA TRANSFER COUNTER REGISTER 

STI 

RO,*+AR0(8) 



OR 

@IEVAL,IE 

r 

ENABLE INTERRUPTS 

OR 

2000H,ST 

r 

ENABLE CPU INTERRUPTS GLOBALLY 

LDI 

@CONTROL,RO 

r 

INITIALIZE DMA GLOBAL CONTROL REGISTER 

STI 

RO,*AR0 

r 

START DMA TRANSFER 

SERIAL PORT 

INITIALIZATION 



LDI 

@SRCTRL,R0 

; SERIAL-PORT RECEIVE CONTROL REG INITIALIZATION 

STI 

RO,*+ARl(3) 



LDI 

@STPERIOD,RO 

; SERIAL-PORT TIMER PERIOD INITIALIZATION 

STI 

RO,*+ARl(6) 



LDI 

@STCTRL,R0 

; SERIAL-PORT TIMER CONTROL REG INITIALIZATION 

STI 

RO,*+ARl(4) 



LDI 

@SGCCTRL,RO 

; SERIAL-PORT GLOBAL CONTROL REG INITIALIZATION 

STI 

RO,*AR1 



BU 

$ 



END 





7-6 














DMA Assembly Programming Examples 


Example 7-3 sets up the DMA to transfer data (128 words) from an array buff¬ 
er to the serial-port-0 output register with serial-port transmit interrupt XINTO. 
The DMA sends an interrupt to the CPU when the data transfer completes. 

Serial port 0 is initialized to transmit 32-bit data words with an internally generated 
frame sync and a bit-transfer rate of 8H1 cycles/bit. The receive-bit clock is inter¬ 
nally generated and equal in frequency to one half of the ’C3x H1 frequency. 

This program assumes previous initialization of the CPU interrupt vector table 
(specifically the DMA-to-CPU interrupt). The serial-port interrupt directly affects 
only the DMA; therefore, no CPU serial-port interrupt vector setting is required. 


I-1 

Note: Serial Port Transmit Synchronization 

The DMA uses serial port transmit interrupt XINTO to synchronize transfers. 
Because the XINTO is generated when the transmit buffer has written the last 
bit of data to the shifter, an initial CPU write to the serial port is required to 
trigger XINTO to enable the first DMA transfer. 

I_I 


Example 7-3. DMA Transfer With Serial-Port Transmit Interrupt 


* TITLE: 

DMA TRANSEER WITH 

.GLOBAL START 

.DATA 

SERIAL PORT TRANSMIT INTERRUPT 

DMA 

.WORD 

808000H 

} 

DMA GLOBAL CONTROL REG ADDRESS 

CONTROL 

.WORD 

0E13H 

r 

DMA GLOBAL CONTROL REG INITIALIZATION 

SOURCE 

.WORD 

(_ARRAY+1) 

r 

DATA SOURCE ADDRESS 

DESTIN 

.WORD 

80804CH 

r 

DATA DESTIN ADDRESS: SERIAL-PORT OUTPUT REG 

COUNT 

.WORD 

127 

} 

NUMBER OF WORDS TO TRANSFER =(MSG LENGHT-1) 

lEVAL 

.WORD 

.BSS 

00100400H 
_ARRAY,128 

} 

r 

} 

r 

IE REGISTER VALUE 

DATA ARRAY LOCATED IN .BSS SECTION 

THE UNDERSCORE USED IS JUST TO MAKE IT 
ACCESSIBLE FROM C (OPTIONAL) 

RESETl 

.WORD 

OEIOH 

} 

DMA RESET 

SPORT 

.WORD 

808040H 

r 

SERIAL-PORT GLOBAL CONTROL REG ADDRESS 

SGCCTRL 

.WORD 

04880044H 

} 

SERIAL-PORT GLOBAL CONTROL REG INITIALIZATION 

SXCTRL 

TION 

.WORD 

lllH 

r 

SERIAL-PORT TX PORT CONTROL REG INITIALIZA- 

STCTRL 

.WORD 

OOFH 

} 

SERIAL-PORT TIMER CONTROL REG INITIALIZATION 

STPERIOD 

.WORD 

00000002H 

} 

SERIAL-PORT TIMER PERIOD 

SPRESET 

.WORD 

00880044H 

r 

SERIAL-PORT RESET 

RESET 

.WORD 

.TEXT 

OH 

} 

SERIAL-PORT TIMER RESET 

START 

LDP 

DMA 

r 

LOAD DATA PAGE POINTER 
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Example 7-3. DMA Transfer With Serial-Port Transmit Interrupt (Continued) 


DMA INITIALIZATION 


LDI 

@DMA,ARO 

; POINT TO DMA GLOBAL CONTROL REGISTER 

LDI 

@SPORT,ARl 


LDI 

@RESET,R0 


STI 

RO,*+ARl(4) 

; RESET SPORT TIMER 

STI 

RO,*AR0 

; RESET DMA 

STI 

RO,*AR1 

; RESET SPORT 

LDI 

@SOURCE,RO 

; INITIALIZE DMA SOURCE ADDRESS REGISTER 

STI 

RO,^+AR0(4) 


LDI 

@DESTIN,R0 

; INITIALIZE DMA DESTINATION ADDRESS REGISTER 

STI 

RO,*+AR0(6) 


LDI 

@COUNT,R0 

; INITIALIZE DMA TRANSEER COUNTER REGISTER 

STI 

RO,*+AR0(8) 


OR 

@IEVAL,IE 

; ENABLE INTERRUPT EROM DMA TO CPU 

OR 

2000H,ST 

; ENABLE CPU INTERRUPTS GLOBALLY 

LDI 

@CONTROL,RO 

; INITIALIZE DMA GLOBAL CONTROL REGISTER 

STI 

RO,*AR0 

; START DMA TRANSEER 

* SERIAL PORT 

INITIALIZATION 


LDI 

@SXCTRL,R0 

; SERIAL-PORT TX CONTROL REG INITIALIZATION 

STI 

RO,*+ARl(2) 


LDI 

@STPERIOD,R0 

; SERIAL-PORT TIMER PERIOD INITIALIZATION 

STI 

RO,*+ARl(6) 


LDI 

@STCTRL,R0 

; SERIAL-PORT TIMER CONTROL REG INITIALIZATION 

STI 

RO,*+ARl(4) 


LDI 

@SGCCTRL,R0 

; SERIAL-PORT GLOBAL CONTROL REG INITIALIZATION 

STI 

RO,*AR1 


CPU WRITES 

THE EIRST WORD 

(TRIGGERING EVENT -> XINT IS GENERATED) 

LDI 

@SOURCE,ARO 


LDI 

*-AR0(1) ,R0 


STI 

RO,*+ARl(8) 


BU 

$ 


.END 




Other examples of DMA initialization include: 

□ Transfer a 256-word block of data from off-chip memory to on-chip 
memory and generate an interrupt on completion. Maintain the memory 
order. 


DMA source address 
DMA destination address 
DMA transfer counter 
DMA global control 
CPU/DMA interrupt enable (IE) 


SOOOOOh 
809800h 
000001OOh 
00000C53h 
00000400h 
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□ Transfer a 128-word block of data from on-chip memory to off-chip 
memory and generate an interrupt on completion. Invert the order of 
memory—the highest addressed member of the block becomes the low¬ 
est addressed member. 


DMA source address 809800h 

DMA destination address 800000h 

DMA transfer counter 00000080h 

DMA global control 00000C93h 

CPU/DMA interrupt enable (IE) 00000400h 

□ Transfer a 200-word block of data from the serial port 0 receive register 
to on-chip memory and generate an interrupt on completion. Synchronize 
the transfer with the serial-port-0 receive interrupt. 

DMA source address 80804Ch 

DMA destination address 809C00h 

DMA transfer counter 000000C8h 

DMA global control 00000D43h 

CPU/DMA interrupt enable (IE) 00200400h 

□ Transfer a 200-word block of data from off-chip memory to the serial port 
0 transmit register and generate an interrupt on completion. Synchronize 
the transfer with the serial-port-0 transmit interrupt. 


DMA source address 809C00h 

DMA destination address 808048h 

DMA transfer counter 000000C8h 

DMA global control 00000E13h 

CPU/DMA interrupt enable (IE) 00400400h 


□ Transfer data continuously between the serial port 0 receive register and 
the serial-port-0 transmit register to create a digital loop back. Synchro¬ 
nize the transfer with the serial-port-0 receive and transmit interrupts. 


DMA source address 
DMA destination address 
DMA transfer counter 
DMA global control 
CPU/DMA interrupt enable (IE) 


80804Ch 

808048h 

OOOOOOOOh 

00000303h 

00300000h 


Programming the DMA Channel 
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Chapter 8 


Analog Interface Peripherals and Applications 


Analog interface peripherals are analog input/output devices that interface di¬ 
rectly to the ’C3x. This chapter describes these devices and their applications 
in ’C3x-based systems. 


Topic Page 


8.1 Analog-to-Digital Converter Interface to the TMS320C30 

Expansion Bus.8-2 
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8.8 Hardware UART for TMS320C3x.8-70 


8-1 
























Analog-to-Digital Converter Interface to the TMS320C30 Expansion Bus 


8.1 Analog-to-Digital Converter Interface to the TMS320C30 Expansion Bus 

Analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) 
are commonly required in DSP systems and interface efficiently to the I/O 
expansion bus. These devices are available in many speed ranges and with 
a variety of features. While some might require one or more wait states on the 
I/O bus, others can be used at full speed. Figure 8-1 illustrates a ’030 interface 
to an Analog Device’s ADI 678 ADC. The ADI 678 is a 12-bit, S-ps converter 
that allows sample rates up to 200 kHz and has an input voltage range of 10 V, 
bipolar or unipolar. The converter is connected according to manufacturer’s 
specifications to provide 0-10-V operation. This interface illustrates a com¬ 
mon approach to connecting such devices to the ’C30. Note that the interface 
requires only a minimum amount of control logic. 

The ADI 678 is a very flexible converter and is configurable in a number of dif¬ 
ferent operating modes. These operating modes include: 

□ Byte or word data format 

□ Continuous or noncontinuous conversions 

□ Enabled or disabled chip-select function 

□ Programmable end-of-conversion indication 

This interface uses a data format of 12-bit words, rather than a byte format, to 
be compatible with the ’C3x. Noncontinuous conversions are selected so that 
variable sample rates can be used; continuous conversions occur at a fixed 
rate of 200 kHz. With noncontinuous conversions, the host processor deter¬ 
mines the conversion rate by initiating conversions through write operations 
to the converter. 

The chip-select input must be active when accessing the device. Enabling the 
chip-select function is necessary to isolate the ADI 678 from other peripheral 
devices connected to the expansion bus. To establish the desired operating 
modes, the SYNC and 12/8 inputs to the converter are pulled high and EOCEN 
is grounded, as specified in the AD1678 Data Sheet. 

In this application, the converter’s chip-select is driven by XA12, which maps 
this device at 804000h in I/O address space. Conversions are initiated by writ¬ 
ing any data value to the device. The conversion results are obtained by read¬ 
ing from the device after the conversion is complete. To generate the device’s 
start conversion (SC) and output enable (OE) inputs, the 74AS32 performs an 
AND operation on lOSTRB and R/W (see Figure 8-1). Therefore, the conver¬ 
ter is selected whenever XA12 is low; OE is driven when reads are performed, 
and SC is driven when writes are performed. 
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Figure 8-1. Interface Between the TMS320C30 and the ADI678 



As with many A/D converters, the AD1678 data output lines enter a high- 
impedance state at the end of a read cycle. This occurs after the output enable 
(OE) or read control line goes inactive. Furthermore, the data output buffer of¬ 
ten requires a substantial amount of time to actually attain a full high-impe¬ 
dance state. When used with the ’C30-33, device output must be fully disabled 
no later than 65 ns following the rising edge of lOSTRB. This is because the 
’C30 begins driving the data bus at this point if the next cycle is a write. If this 
timing is not met, bus conflicts between the ’C30 and the AD1678 can occur. 
This degrades system performance and may cause failure due to damaged 
data bus drivers. The actual disable time for the AD1678 can be as long as 
80 ns; therefore, 74LS244 buffers are used to isolate the converter outputs 
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Analog-to-Digital Converter Interface to the TMS320C30 Expansion Bus 


from the ’C30. The buffers are enabled when the AD1678 is read and are 
turned off 30.8 ns after lOSTRB goes high, meeting the ’C30-33 requirement 
of 65 ns. 

When data is read following a conversion, the AD1678 takes 100 ns after its 
OE control line is asserted to provide valid data at its outputs. Thus, including 
the propagation delay of the 74LS244 buffers, the total access time for reading 
the converter is 118 ns. This requires two wait states on the ’C30-33 expansion 
I/O bus. 

The two wait states required in this case are implemented using software wait 
states. However, depending on the overall system configuration, you can im¬ 
plement a separate wait-state generator for the expansion bus (for example, 
in a case where multiple devices that require different numbers of wait states 
are connected to the expansion bus). See section 4.5 Wait States and Ready 
Generation on page 4-10. 

Figure 8-2 shows the timing for read operations between the ’C30-33 and the 
ADI 678. At the beginning of the cycle, the address and XR/W lines become 
valid at 10 ns (ti) following the falling edge of H-|. Then, after 10 ns (t 2 ) from 
the next rising edge of Hi, lOSTRB goes low. This begins the active portion 
of the read cycle. After the control logic propagation delay at 5.8 ns (ts), the 
lOR signal goes low, asserting the OE input to the ADI 678. The 74LS244 buff¬ 
ers take 30 ns (t 4 ) to enable their outputs. Then, after the converter access 
delay and the buffer propagation delay at 118 ns (15 which equals 100 + 18), 
data is provided to the ’C30. This provides approximately 46 ns of data setup 
time before the rising edge of lOSTRB. Therefore, this design easily satisfies 
the ’C30-33’s requirement of 15 ns of data setup time for reads. 


Figure 8-2. Read Operations Timing Between the TMS320C30 and the AD 1678 
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Unlike the primary bus, read and write cycles on the I/O expansion bus are 
timed the same but have the following exceptions: 

□ XR/W is high for reads and low for writes 

□ The data bus is driven by the ’C30 during writes (reads are the same) 

When writing to the AD1678, the 74LS244 buffers do not turn on and no data 
is transferred. The purpose of writing to the converter is only to generate a 
pulse on the converter’s SC input, which initiates a conversion cycle. When a 
conversion cycle is completed, the AD1678’s end of conversion (EOC) output 
generates an interrupt on the ’C30 to indicate that the converted data can be 
read. 

The TLC1225 is a self-calibrating 12-bit-plus-sign bipolar or unipolar conver¬ 
ter, which features 10-|a.s conversion times. The TLC1550 is a 10-bit, S-ps con¬ 
verter with a high-speed DSP interface. Both converters are parallel-interface 
devices. 
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8.2 Digital-to-Analog Converter Interface to the TMS320C30 Expansion Bus 

In many DSP systems, the requirement for generating an analog output signal 
is a consequence of sampling an analog waveform with an ADC so that it can 
be processed digitally. This digitally processed signal is then reproduced with 
a digital-to-analog converter (DAC). Interfacing the DAC to the ’C30 on the 
expansion I/O bus is also straightforward. 

Various types of DACs may be distinguished by whether or not the converters 
include: 

□ Latches to store the digital value to be converted to an analog quantity 

□ The interface to control those latches 

When latches and control logic are included, Interface design is often simpli¬ 
fied; however, internal latches are often included only in slower DACs. 

Although slower converters limit signal bandwidth, the converter design 
described in Figure 8-3 allows a reasonably wide range of signal frequencies 
to be processed and illustrates the technique of interfacing to a converter that 
uses external data latches. 

Figure 8-3 shows an interface to an Analog Device, AD565A DAC. This 
device is a 12-bit, 250-ns current output DAC with an on-chip 10-V reference. 
Using an off-chip current-to-voltage conversion circuit connected according to 
the manufacturer’s specifications, the converter exhibits output signal ranges 
of 0-10 V, which is compatible with the conversion range of the ADC discussed 
in the previous section. 

Because this DAC essentially performs continuous conversions based on the 
digital value provided at its inputs, periodic sampling is maintained by updating 
the value stored In the external latches at regular intervals. Therefore, 
between updates, the digital value is stored and maintained at the latch out¬ 
puts that provide the input to the DAC. This results in a stable analog output 
until the next sample update is performed. 
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Figure 8-3. Interface Between the TMS320C30 and the AD565A 
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The external data latches are 74LS377 devices that have both clock and 
enable inputs. These latches serve as a convenient interface with the ’C30; the 
enable inputs provide a device select function and the clock inputs latch the 
data. The enable input driven by inverted XA12 and the clock input driven by 
low (which is the AND of lOSTRB and XR/W). Therefore, data is stored in the 
latches when a write is performed to I/O address 805000h. Reading this 
address has no effect on the circuit. 

Figure 8-4 shows the timing diagram of a write operation to the DAC latches. 
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Figure 8-4. Timing Diagram for Write Operation to the DAC 
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Because the data is written to the latches, rather than to the DAC, the timing 
requirements for these devices are fundamental to the operation of the inter¬ 
face. At a minimum, these latches require: 

□ Data setup time of 20 ns 

□ Enable setup time of 25 ns 

□ Disable setup time of 10 ns 

□ Data and enable hold times of 5 ns 

This design provides approximately 60 ns of enable setup, 30 ns of data setup, 
and 7.2 ns of data hold time. Therefore, the setup and hold times provided by 
this design exceed those required by the latches. The key timing parameters 
for this interface are summarized in Table 8-1. 
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Table 8-1. Key Timing Parameters for DAC Write Operation 


Time 

Interval 

Event 

Time 

Periodt 

ti 

H1 falling to address valid 

10 ns 

t2 

XA12to XA12delay 

5 ns 

^3 

H1 rising to lOSTRB falling 

10 ns 

t4 

lOSTRB to low delay 

5.8 ns 

^5 

Data setup to lOW 

30 ns 


Data hold from lOW 

7.2 ns 


t Timing for the ’C30-33 
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8.3 Burr-Brown DSP101/2 and DSP201/2 Interface to TMS320C3x 

Figure 8-5 shows how to interface the ’C3x with zero glue logic to Burr- 
Brown’s DSP201 /2 and DSP101 /2 family of 16-bit DAC and ADC. Using a ’C3x 
and the DSP202 and DSP102 dual-channel DAC and ADC chips provides an 
efficient, low-cost, stereo, digital audio interface. 


Figure 8-5. TMS320C31 Zero Glue-Logic Interface to Burr-Brown ADC and DAC 

Burr-Brown DSP102 ADC Burr-Brown DSP202 DAC 



The DSP102 ADC is interfaced to the ’C3x serial port receive side; the DSP202 
DAC is interfaced to the transmit side. The ADC and DAC are hard-wired to 
run in cascade mode. In this mode, when the ’C3x initiates a convert command 
(CONV) to the ADC through its TCLKO pin, both analog inputs are converted 
into two 16-bit words that are concatenated to form one 32-bit word. The ADC 
signals the ’C3x that serial data from the last conversion is being transmitted 
through the ADC’s SYNC signal. The 32-bit word is then serially transmitted, 
most significant bit (MSB) first, through the SOUTA serial pin of the DSP102 
to the DRO pin of the ’C3x serial port. The ’C3x is programmed to drive the ana¬ 
log interface bit clock from its CLKXO pin. The bit clock drives both the ADC 
and DAC XCLK input. 

The ’C3x transmit clock can also act as the input clock on the receive side of 
the ’C3x serial port. Since the receive clock is synchronous to the ’C3x’s inter¬ 
nal clock, the receive clock can run at full speed (even though it is an external 
clock). 
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Similarly, upon receiving a convert command (CONV), the DAC converts the 
last word received from the ’C3x. It signals the ’C3x, through the SYNC signal, 
to begin transmitting a 32-bit word representing the two channels of data to be 
converted. The data, transmitted from the ’C3x DXO pin, is input to both the 
SINA and SINB inputs of the DAC. 

The ’C3x is set up to transfer bits at the maximum rate of about 8 Mbytes/s. 
It uses a dual-channel sample rate of about 44.1 KHz by setting the following 
registers (assuming a 32 MHz CLKIN): 

Serial Port: 

Port global control register 
FSX/DX/CLKX port control register 
FSR/DR/CLKR port control register 
Receive/transmit timer control register 

Timer: 

Timer global control register 
Timer period register 


0X0EBC0040 

0 x00000111 

0 x00000111 

OxOOOOOOOF 


0X000002C1 

OxOOOOOOBS 


A synchronous receive interrupt service routine is sufficient for parsing and 
transferring data between the serial ports and memory. Source code for setting 
up the serial port and timers of the ’C3x for interfacing to the DSP102 and 
DSP202 can be found on the Tl BBS (file name: C3XBB.EXE). This code is 
listed in Example 8-1 through Example 8—4. 
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Example 8-1. TMS320C3x /BB- DSP102/202 Driver Header File 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* BB.H */ 

/-k -k/ 

/* TMS320C3X - BB DSP102/202 DRIVER HEADER EILE 

Ik-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kj 

^include <serprt30.h> 

^include <timer30.h> 

^include <dma30.h> 

^include <bus30.h> 

^include <general.h> 

jk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k! 

/■k COMMON STRUCTURES */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

typedef volatile int VI; 
typedef volatile float VF; 
typedef VF * volatile VPVF; 
typedef VI ^ volatile VPVI; 

JkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkJ 

/* FUNCTION PROTOTYPES */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

void c_int99(void); 

void heap_overflow(void) ; 

void init_c30(void); 

void error_in_real_time(void); 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

/* MACROS */ 

JkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkJ 

fdefine BLOCK_SIZE 64 /* BUFFER SIZE */ 

GEN_OSC OFF /* GENERATE OSCILLATOR */ 

GEN_CC ON /* GENERATE CONVERT COMMAND */ 

SER_NUM SERIAL_PORT_ONE 

OSC_TIMER_NUM TIMER_ZERO 
CC_TIMER_NUM TIMER_ONE 

1 


#define 
#define 
#define 
#define 
#define 
#define 
#define 


#define 
#define 


XF_NUM 
ERROR_CHECK ON 

WAIT_BUFFERS while(!buffer_rcvd || !buffer_xmtd); 

RESET_FLAGS buffer_rcvd = buffer_xmtd = FALSE 


#define 

INIT_ARRAYS 

init_ 

_arrays(t_buffer. 

r_buffer) 


#if 

XF_NUM 






#define 

RESET_BB 

asm (' 

' AND 

2Fh,lOF"); 

asm(" OR 

20h,lOF" 

#define 

UN_RESET_BB 

asm (' 

' OR 

60h,lOF") 



felse 







#define 

RESET_BB 

asm (' 

' AND 

0F2h,lOF"); 

asm(" OR 

2h,lOF" 

#define 

UN_RESET_BB 

asm (' 

' OR 

6h,lOF") 




fendif 
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Example 8-1 .TMS320C3x / BB - DSP102/202 Driver Header File (Continued) 


/* TIMER PERIOD VALUES ARE BASED ON AN INPUT CLOCK OF 30 MHz */ 

#define CD OxAA 

#define DAT 0x9C 

#define TIMER_PERIOD CD 

#define WAIT(A) for(i=0;i<A;i++); 

/iririririririririririririr-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* STRUCTURES */ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

typedef union 

{ 

unsigned int _intval; 
struct { 

signed int chanO :16; 
signed int chanl :16; 

} _bitval; 

} BB_CASC_WORD; 

^-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k^ 

/* GLOBAL VARIABLES */ 


/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

k k / 

extern 

int 

t_buffer; 


/* OUTPUT BUFFER SIZE 

*/ 

extern 

int 

r_buffer; 


/* INPUT BUFFER SIZE 

k / 

extern 

VPVF 

output0; 

/* 

OUTPUT DATA BUFFER FOR PROCESSOR 

k / 

extern 

VPVF 

input0; 

/* 

INPUT DATA BUFFER FOR PROCESSOR 

*/ 

extern 

VPVF 

output_xferO; 

/k 

OUTPUT DATA BUFFER FOR ISR/BB 

k / 

extern 

VPVF 

input_xferO; 

jk 

INPUT DATA BUFFER FOR ISR/BB 

k / 

extern 

VPVF 

output1; 


OUTPUT DATA BUFFER FOR PROCESSOR 

*/ 

extern 

VPVF 

input1; 

/■^ 

INPUT DATA BUFFER FOR PROCESSOR 

k / 

extern 

VPVF 

output_xferl; 

/* 

OUTPUT DATA BUFFER FOR ISR/BB 

*/ 

extern 

VPVF 

input_xferl; 

/* 

INPUT DATA BUFFER FOR ISR/BB 

*/ 

extern 

VI 

buffer_rcvd; 

/-^ 

CPU-ISR COMM FLAG (INPUT) 

k / 

extern 

VI 

bu f f e r_xmt d; 


CPU-ISR COMM FLAG (OUTPUT) 

*/ 

extern 

VI 

r_index; 

Jk 

INDEX INTO INPUT AND OUTPUT DATA ARRAYS 

*/ 

extern 

VI 

t_index; 

Jk 

INDEX INTO INPUT AND OUTPUT DATA ARRAYS 


extern 

VI 

i; 


GENERIC COUNTER VARIABLE 

*/ 


/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/* EUNCTION PROTOTYPES */ 

^kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk^ 

/kkkkkkkkkkkkkkkkkkkkkkk/ 

/* BB DRIVER FUNCTIONS */ 

/kkkkkkkkkkkkkkkkkkkkkkk/ 

void init_arrays(int t_buffer_size, int r_buffer_size); 
void init_bb(int period_vaIue); 

#if SER_NUM 

void c_int07(void) ; 

#else 

void c_int05(void); 

#endif 
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Example 8-2. TMS320C3x - BB DSP102/202 Driver 


/■^ BBDRVR.C */ 

! -k -k ! 

/* TMS320C3X - BB DSP102/202 DRIVER 

-k / 

j-kk-k-k-kk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kk-k-k-kk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kk-k-k-k-k-k-k-kj 

^include <math.h> 

^include <stdlib.h> 

^include <bb.h> 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/■^ GLOABL VARS */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 


int 

t_buffer = BLOCK_SIZE; 


/k OUTPUT BUFFER SIZE 

*/ 

int 

r_buffer = BLOCK_SIZE; 


/■^ INPUT BUFFER SIZE 

*/ 

VPVF 

output0; 

/^ 

OUTPUT DATA BUFFER FOR PROCESSOR 

^/ 

VPVF 

input0; 

jk 

INPUT DATA BUFFER FOR PROCESSOR 

*/ 

VPVF 

output_xferO; 

/* 

OUTPUT DATA BUFFER FOR ISR/BB 

^/ 

VPVF 

input_xferO; 

/* 

INPUT DATA BUFFER FOR ISR/BB 

^/ 

VPVF 

output1; 

/^ 

OUTPUT DATA BUFFER FOR PROCESSOR 

*/ 

VPVF 

input1; 

/* 

INPUT DATA BUFFER FOR PROCESSOR 

^/ 

VPVF 

output_xferl; 

/k 

OUTPUT DATA BUFFER FOR ISR/BB 


VPVF 

input_xferl; 

Jk 

INPUT DATA BUFFER FOR ISR/BB 

*/ 

VI 

buffer_rcvd = FALSE; 

/^ 

CPU-ISR COMM FLAG (INPUT) 

^/ 

VI 

buffer_xmtd = FALSE; 

/* 

CPU-ISR COMM FLAG (OUTPUT) 

^/ 

VI 

r_index = 0; 

/^ 

INDEX INTO INPUT AND OUTPUT DATA ARRAYS 

k / 

VI 

t_index = 0; 

/^ 

INDEX INTO INPUT AND OUTPUT DATA ARRAYS 

*/ 

VI 

i; 

Jk 

GENERIC COUNTER VARIABLE 

k / 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 


/* FUNCTION DECLARATIONS */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

/* VOID C_INT05() OR C_INT07(): */ 

/* ISR FOR HANDLING DATA TRANSFER BETWEEN CSX SERIAL PORT */ 

/* ONE AND THE A/D,D/A. ASSUMES SYNCHRONOUS OPERATION. */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

#if SER_NUM 

void c_int05(void) { } 

void c_int07(void) 

#else 

void c_int07(void) {} 
void c_int05(void) 

#endif 

{ 

BB_CASC_WORD temp; 

VPVF swap; 
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Example 8-2. TMS320C3x - BB DSP102/202 Driver (Continued) 


/* DSP102/202 TRANSFER TWO SIXTEEN BIT WORDS REPRESENTING */ 
/* BOTH CHANNELS IN ONE THIRTYTWO BIT WORD. EXTRACT INTO */ 
/* THE INPUT_XFER BUFFERS */ 

temp._intval = SERIAL_PORT_ADDR(SER_NUM)->r_data; 
input_xferO[r_index] = temp._bitval.chanO; 
input_xferl[r_index] = temp._bitval.chanl; 

/* WRITE OUTPUT_XFER BUEFER VALUE BY CASCADING BOTH CHANNELS 
temp._bitval.chanO = output_xferO[t_index]; 
temp._bitval.chanl = output_xferl[t_index]; 
SERIAL_PORT_ADDR(SER_NUM)->x_data = temp._intval; 


/■^ CHECK IF BUFFERS ARE EULL */ 
if(++r_index == r_buffer) 

{ 

CHECK CPU SYNCHRONIZATION FLAG */ 

#if ERROR_CHECK 

/* if(buffer_rcvd == TRUE) error_in_real_time(); 

if(buffer_rcvd == TRUE) for(;;); 

fendif 


swap 
input0 
input_xferO 
swap 
input1 
input_xferl 
r_index 
buffer_rcvd 

} 

if(++t_index == 


= input0; 

= input_xferO; 
= swap; 

= input1; 

= input_xferl; 
= swap; 

= 0 ; 

= TRUE; 

t_buffer) 


{ 

/* CHECK CPU SYNCHRONIZATION ELAG */ 

#if ERROR_CHECK 

/* if(buffer_xmtd == TRUE) error_in_real_time(); 

if(buffer_xmtd == TRUE) for(;;); 

fendif 


swap 
output0 
output_xferO 
swap 
output1 
output_xferl 
t_index = 0; 
buffer_xmtd = TRUE; 


= output0; 

= output_xferO; 
= swap; 

= output1; 

= output_xferl; 
= swap; 
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Example 8-2. TMS320C3x - BB DSP102/202 Driver (Continued) 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* INIT_ARRAYS0 : INITIALIZE DATA ARRAY PARAMETERS */ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

void init_arrays (int t_buffer, int r_buffer) 

{ 

int i; 

^-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k^ 


/* INITIALIZE AND ZERO FILL ARRAYS */ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 




if 

( ! (input0 
heap_overflow() 

} 

(float 

^) 

calloc 

(r_buffer. 

sizeofI 

(float 

if 

( ! (output0 
heap_overflow() 

f 

(float 

^) 

calloc 

(t_buffer, 

sizeofI 

(float 

if 

( ! (input_xferO 
heap_overflow() 

} 

(float 

^) 

calloc 

(r_buffer. 

sizeofI 

(float 

if 

(!(output_xfer0 
heap_overflow() 

} 

(float 

*) 

calloc 

(t_buffer, 

sizeofI 

(float 

if 

( ! (input! 
heap_overflow() 

} 

(float 

^) 

calloc 

(r_buffer. 

sizeofI 

(float 

if 

( ! (output! 
heap_overflow() 

f 

(float 

^) 

calloc 

(t_buffer, 

sizeofI 

(float 

if 

( ! (input_xferl 
heap_overflow() 

} 

(float 

*) 

calloc 

(r_buffer. 

sizeofI 

(float 

if 

(!(output_xferl 
heap_overflow() 

} 

(float 

*) 

calloc 

(t_buffer, 

sizeofI 

(float 


for(i = 0; i < t_buffer; i++) 

{ 

output0[i] = output_xferO[i] = 0.0; 
outputl[i] = output_xferl[i] = 0.0; 

} 

} 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/* INIT_BB(): INITIALIZE COMMUNICATIONS TO DSP102/202 */ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

void init_bb(int period_vaiue) 

{ 

/* RESET D/A, MAKE SURE RESET IS HELD LOW SUEFICIENTLY (?) LONG */ 
RESET_BB; 

WAIT(50); 

#if GEN_OSC 

/* CONFIGURE CSX TIMER AS BB A/D OSC */ 

TIMER_ADDR(OSC_TIMER_NUM)->gcontroI = 0x0; 

TIMER_ADDR(OSC_TIMER_NUM)->counter = 0x0; 

TIMER_ADDR(OSC_TIMER_NUM)->period = 0x0; 

TIMER_ADDR(OSC_TIMER_NUM)->gcontroI = FUNC | GO | HLD_ | CP_ | CLKSRC; 
#endif 
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Example 8-2. TMS320C3x - BB DSP102/202 Driver (Continued) 



/* CONFIGURE SERIAL PORT */ 
SERIAL_PORT_ADDR(SER_NUM)->gcontroI 

= 0x0; 




SERIAL_PORT_ADDR(SER_NUM)->s_x_control 

= CLKXFUNC 

1 DXEUNC 1 

ESXFUNC; 


SERIAL_PORT_ADDR(SER_NUM)->s_r_control 

= CLKRFUNC 

1 DRFUNC 1 

ESRFUNC; 


SERIAL_PORT_ADDR(SER_NUM)->s_rxt_control ^ 

= OxOF; 




SERIAL_PORT_ADDR(SER_NUM)->s_rxt_period 

= 0x0; 




SERIAL_PORT_ADDR(SER_NUM)->gcontroI 

= XCLKSRCE 

1 XLEN_32 

1 RLEN_32 1 



XINT 1 XRESET | RRESET; 


/* CLEAR SERIAL TRANSMIT DATA */ 

SERIAL_PORT_ADDR(SER_NUM)->x_data = 0x0; 





TAKE A/D,D/A OUT OF RESET, (OPTIONALY) 

CLEAR THE 

INT FLAG REG, */ 


/* ENABLE THE APPROPRIATE SERIAL PORT TRANSMIT INT AND ENABLE 



/* GLOBAL INTERRUPTS 

UN_RESET_BB; 

CL_INT_FL_REG; 



^/ 

#if 

SER_NUM 





EN_SER_PORT_XMT_INT_l; 




#else 





EN_SER_PORT_XMT_INT_0; 




#endif 





EN_GLOBAL_INTS; 




#if 

GEN_CC 





/* CONFIGURE CSX TIMER 1 AS BB A/D,D/A CONVERT CLOCK 

*/ 



TIMER_ADDR(CC_TIMER_NUM)->gcontroI = 0x0; 
TIMER_ADDR(CC_TIMER_NUM)->counter = 0x0; 
TIMER_ADDR(CC_TIMER_NUM)->period = period 

_value; 




TIMER_ADDR(CC_TIMER_NUM)->gcontroI = FUNC 

1 GO 1 HLD. 

1 CLKSRC; 


#endif 

} 
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Example 8-3. General Macro Definitions 

/ir-kiririr-kiririr-kiririr-kiririr-kirir-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* general.h v4.2 


/* Copyright (c) 1991 Texas 

Instruments Incorporated '^/ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

#ifndef _GENERAL 


#define _GENERAL 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/■k COMMON MACRO DEFINTIONS 


l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

#ifndef OFE 


#define OFF 

0x00 

#endif 


#ifndef ON 


#define ON 

0x01 

#endif 


#ifndef FALSE 


#define FALSE 

0x00 

#endif 


#ifndef TRUE 


#define TRUE 

0x01 

#endif 


#ifndef CLEAR 


#define CLEAR 

0x00 

#endif 


#ifndef SET 


#define SET 

0x01 

#endif 
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Example 8-3. General Macro Definitions (Continued) 


/* GENERAL C3x MACROS */ 

/iriririririr-kiriririririririririr-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

#ifndef INIT_XE_PINS 

#define INIT_XE_PINS asm(" LDI OOh^IOE") 

#endif 

#ifndef CL_INT_FL_REG 

#define CL_INT_FL_REG asm(" LDI Oh,IF") 


#endif 


#ifndef 
#define 
#endif 

EN_GLOBAL_INTS 

EN_GLOBAL_INTS asm(" OR 2000h,ST") 

#ifndef 
#define 
#endif 

EN_SER_PORT_XMT_INT_0 

EN_SER_PORT_XMT_INT_0 asm(" OR 10h,IE") 

#ifndef 
#define 
#endif 

EN_SER_PORT_RCV_INT_0 

EN_SER_PORT_RCV_INT_0 asm(" OR 20h,IE") 

#ifndef 
#define 
#endif 

EN_SER_PORT_XMT_INT_l 

EN_SER_PORT_XMT_INT_l asm(" OR 40h,IE") 

#ifndef 
#define 
#endif 

EN_SER_PORT_RCV_INT_l 

EN_SER_PORT_RCV_INT_l asm(" OR 80h,IE") 

#ifndef 
#define 
#endif 

ENABLE_CACHE 

ENABLE_CACHE asm(" OR 800h,ST") 


#endif #ifndef _GENERAL */ 
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Example 8-4. Common Driver Header File 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/■k COMMDRVR.H */ 

/-k -k/ 

/* TMS32 0C3X - COMMOM DRIVER HEADER FILE 

l-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kkl 

^include <c30_per.h> 

l-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kkl 

/-k COMMON STRUCTURES */ 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

typedef volatile int VI; 
typedef volatile float VF; 
typedef VF * volatile VPVF; 
typedef VI * volatile VPVI; 


j-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kj 

/* FUNCTION PROTOTYPES */ 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

void c_int99(void); 

void heap_overflow(void); 

void init_c30(void); 

void error_in_reai_time(void); 
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8.4 TLC32040 Interface to the TMS320C3x 

Figure 8-6 shows how to interface the ’C3x with zero glue logic to a Texas 
Instruments’ TLC32040 14-bit analog interface circuit (AlC). The following 
sections describe the steps required to initialize and set up the ’C3x timer and 
serial port, and to reset and program the TLC32040. 


Figure 8-6. TM320C3x-to-TLC32040 Interface 

’C3x TLC32040 



Analog out 
Analog in 


8.4.1 Resetting the Analog Interface Circuit 


The ’C31 ’s XFO signal is connected to the RESET signal of the AlC. By toggling 
the RESET signal, the ’C31 can reset the AlC. This is achieved by executing 
the following instructions: 


rpts 40 
Idi 2h,IOF 
Idi 6h,IOF 


Execute next instruction 40x 
Pull AIC into reset 
Pull AIC out of reset 
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8.4.2 Initializing the TMS320C31 Timer 

The ’C3Ts timer (TCLKO) signal is connected to the AlC’s master clock 
(MCLK) signal. The MCLK signal drives all the key logic signals of the AlC, 
such as the shift clock, the switched-capacitor filter clocks, and the ADC and 
DAC timing signals. The timer pulses the TCLKO signal whenever the ’C31 tim¬ 
er counter register (which is memory mapped to 0x808024) counts up to the 
value in the timer period register (which is memory mapped to 0x808028). 
Then, the timer counter register resets to 0 and repeats. (For a detailed 
description of the ’C31 timer, see the TMS320C3x User’s Guide.) Because of 
differences between the maximum frequency of the ’C31 ’s timer and the maxi¬ 
mum and minimum frequencies of the AlC, observe the following constraints: 

□ Minimum Timer Period Register Vaiue.The ’C31 running at 50 MHz can 
generate a maximum timer frequency of 12.5 MHz (CLKIN/4), which is 
above the AlC’s tested master clock frequency maximum of 10 MHz. If you 
use frequencies beyond those listed in the TLC32040 Data Sheet, the re¬ 
sulting performance can be unpredictable. If the timer is run in pulse mode 
(control value is 0x2C1) the minimum period of 1 results in 12.5-MHz mas¬ 
ter pulse rate and a period of 2 results in 6.25 MHz. See the TLC32040 
Data Sheet tor more information. 

□ Maximum Timer Period Register Vaiue. The AlC’s minimum master 
clock frequency is 75 kHz. Taking into account the ’C31 maximum timer 
frequency of 12.5 MHz and the AlC’s minimum master clock frequency, 
the maximum value in the ’C3Ts timer counter register must be 165 
(12.5 MHz / 75 kHz = 166.7). The ’C31 ’s timer counts down to 0; therefore, 
you must subtract 1 from this number (166- 1 = 165). The TLC32040 
specification describes a minimum clock frequency, since the internal sig¬ 
nals of the AlC are stored in capacitors that must be periodically updated. 

The following ’C31 assembly code initializes the timer in clock mode with a tim¬ 
er period of 1. The following code initializes timer 0 to generate a square wave 
(clock mode) on the TCLKO pin at a frequency of 6.25 MHz (timer period = 1): 


TGCRO 

. set 

808020h 

; 

Timer 0 global control register 

TCNTO 

. set 

808024h 

} 

Timer 0 counter register 

TPRO 

. set 

808028h 

} 

Timer 0 period register 

TIMVAL 

. word 

3clh 

; 

Timer global control register value 


Idp 

@TGCR0 

} 

Set Data Page 


Idi 

0h,R4 

} 

Initialize R4 to zero 


Idi 

lh,R0 

} 

Initialize RO to 1 


sti 

R4,@TGCR0 

} 

Reset timerO 


sti 

RO,@TPR0 

; 

Store timerO period 


sti 

R4,@TCNT0 

} 

Reset timerO counter 


Idi 

@TIMVAL,R7 

} 

Load timer control value 


sti 

R7,@TGCR0 

; 

Start timer 0 
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A period of 0 is not allowed in pulse mode. If the timer is run in clock mode, the 
resulting output is a square wave with a frequency of half that of pulse mode. 
A period of 0 is allowed in clock mode resulting in a 12.5-MHz clock. 


8.4.3 Initializing the TMS320C31 Serial Port 

This section explains how to initialize the: 

□ ’C31 serial port 

□ ’C31 serial-port control register (memory mapped to 0x808040) 

□ FSX/DX/CLKX control register (memory mapped to 0x808042) 

□ FSR/DR/CLKR control register (memory mapped to 0x808043) 

For a detailed description of the ’C31 serial port, see the TMS320C3x User’s 
Guide. 

Example 8-5 shows the assembly code to initialize the serial port global con¬ 
trol register (SGCRO) for the ’C31 in the following manner: 

1) Issue transmit and receive resets 

2) Enable receive and transmit interrupts 

3) Set 16-bit receive and transmit transfers 

4) Set FSX and FSR, CLKX and CLKR active low 

5) Set continuous mode 

6) Set variable data rate transfers 

See the example code supplied with the DSP for help on setting up the AlC. 


Example 8-5. Initialize the Serial Port Global Control Register 


SGCRO 

. set 

808040h 

; Seriai port 0 giobai controi register ; 

SPCXO 

. set 

808042h 

; Seriai port 0 FSX/DX/CLKX controi 

reg. ; 

SPCRO 

. set 

808043h 

; Seriai port 0 FSR/DR/CLKR controi 

reg. ; 

SINITO 

. word 

0e973300h 

; Enabie RiNT & i6-bit transfers 


SINITl 

. word 

iiih 

; Configure as seriai port pins 



Idp 

@SGCR0 

; Set Data Page 



Idi 

0h,R4 

; initiaiize R4 to zero 



st i 

R4,@SGCR0 




Idi 

OSiNiTi,R7 

; Reset and 



st i 

R7,@SPCX0 

; initiaiize seriai port 



sti 

R7,@SPCR0 

; initiaiize seriai port 



Idi 

@SiNiT0,R7 

; Reset and 



sti 

R7,@SGCR0 

; initiaiize seriai port 
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8.4.4 Initializing the AlC 

Once the ’C31 supplies MCLK, initializes its serial port, and resets the AlC, you 
can initialize the AlC to a specified sample rate. The AlC sampling rate is deter¬ 
mined by the values of two registers (Tx counter A and Tx counter B) in the 
AlC’s transmit and receive sections. These values are loaded into the respec¬ 
tive counter whenever the counter counts down to 0. The Tx counters A and 
B determine the D/A conversion timing. The Rx counters A and B determine 
the A/D conversion timing. For more information, see the TLC32040 AlC Data 
Sheet. The formula for the conversion frequency is given in Equation 8-1. 

Equation 8-1. Conversion Frequency 

^ . MCLK 

Conversion_ frequency = - 

2XA xe 

To ensure that the switched-capacitor lowpass and bandpass filters meet their 
transfer function characteristics, the frequency of the clock inputs of the 
switched-capacitor filter must be 288 kHz. Otherwise, the upper and lower cut¬ 
off frequencies of the lowpass and bandpass are scaled accordingly. 
Equation 8-2 shows the switched-capacitor filter frequency. 

Equation 8-2. Switched Capacitor Filter Frequency 

SCFjClock _ frequency - 


For example, using this equation for an 8-kHz sampling rate with an MCLK of 
6.25 MHz results in a Tx counter A of 11 [A = MCLK / (2 x SCF)]. Using 
Equation 8-2, Tx counter B results in 36 [B = MCLK / (2 x A x Conver- 
sion_Frequency)\. 

To initialize the AlC’s Tx counter A and B registers, you must send a primary 
communication followed by a secondary communication (as explained in the 
following sections). Primary communications load values into the D/A while 
secondary communications load A/D internal registers, such as the control 
register, Tx counters A and B, and Rx counters A and B. 
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8.4.4.1 Primary Communications 

Primary communications have a data value in the 14 MSBs (D15-D2) of data 
and a mode selection in the two least significant bits (LSBs) (D1-D0). This for¬ 
mat is shown in Figure 8-7. 

The AlC sends the data value to the DAC and enables one of the modes shown 
in Table 8-2, depending on the two LSBs. 


Figure 8-7. Primary Communication Data Format 
D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 DO 


DAC value 


Mode 

selection 


Tabie 8-2. Primary Communications Mode Selection 


LSBs 

Mode 

00 

Tx counter A ^ TA, Rx counter A ^ RA 

Tx counter B ^ TB, Rx counter B ^RB 

01 

Tx counter A^ TA + TA, Rx counter A ^ RA + RA 

Tx counter B ^TB, Rx counter B ^ RB 

10 

Tx counter A ^TA - TA, Rx counter A ^ RA + RA 

Tx counter B ^TB, Rx counter B ^ RB 

11 

Tx counter A ^ TA, Rx counter A ^ RA 

Tx counter B ^TB, Rx counter B ^ RB 


The second and third modes use the TA and RA registers to advance or slow 
down the sampling frequency by respectively shortening or lengthening the 
sample period. This is particularly useful in modem applications, where it can 
enhance the signal-to-noise performance, perform frequency-tracking func¬ 
tions, and generate nonstandard modem frequencies. 

8.4.4.2 Secondary Communications 

Secondary communication follows a primary communication that has the two 
LSBs set to 11 together. This secondary communication programs the AlC by 
loading the A, A, B, or control registers. Figure 8-8 shows the secondary com¬ 
munication data format. The TA, RA, TB, and RB values are unsigned. The TA 
and RA values are in signed 2s-complement format. The control register 
enables bandpass filters and asynchronous transmit/receive, enables and 
disables auxiliary inputs, and changes input gain. 
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Table 8-3 describes the control register bit fields. 

Figure 8-8. Secondary Communication Data Format 

D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 DO 


X 

X 

TA register value (unsigned) 

X 

X RA register value (unsigned) 

0 0 

X 

TA’ register value (signed 2s 
complement) 

X 

RA register value (signed 2s 
complement) 

0 1 

X 

TB register value (unsigned) 

X 

RB register value (unsigned) 

1 0 

X 

X 

X X X X X 

X 

Control register 

1 1 


Table 8-3. Control Register Bit Fields 


D7 D6 D5 D4 D3 D2 


Input gain 

Transmit/receive 

AUX IN pins 

Loopback 

function 

Bandpass 

filter 

0 0 = 1X for ± 6-V analog input 

0 = asynchronous 

0 = disables 

0 = disables 

0 = deletes 

0 1 = 2X for ± 3-V analog input 

1 = enables 

1 = enables 

1 = enables 

1 = inserts 

1 0 = 4X for ± 1.5-V analog in¬ 
put 





1 1 = 1X for ± 6-V analog input 






The assembly code in Example 8-6 sets the TA and TB registers of the AlC. 
This code transmits a 16-bit word to the AlC and then waits until the transmit 
interrupt is generated by the serial port. Four commands are transmitted start¬ 
ing with a 0, then the TB and RB values, followed by the TA and RA values, 
and finally the control word. TA and RA values should be the last values trans¬ 
mitted, since they change the AlC sample rate. By transmitting these values 
last, the sample rate is not changed until the AlC receives the last program 
word. In this way, very high sample rates can be achieved. Each command 
transmits three 16-bit words: a primary communication, a secondary commu¬ 
nication, and a zero-data word. 
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Example 8-6. Setting the TA and TB Registers 



r 

; LOOPAIC 

.ASM is 

an example program which shows how to initialize and use 

; the TLC32040. 

The analog output (DAG output) is either a ramp signal 

; (RAMPEN 

= 1) or 

a loopback 

of the analog input (RAMPEN=0). 

r 

} 

} Define 

constants used by 

program 


r 

DAMPEN 

. set 

1 

f 

Set to 1 to generate ramp at AOUT 

T0_ctrl 

. set 

0x808020 

} 

TIMO gl control 

T0_count 

. set 

0x808024 

f 

TIMO count 

T0_prd 

. set 

0x808028 

r 

TIMO prd 

S0_gctrl 

. set 

0x808040 

f 

SP 0 global control 

S0_xctrl 

. set 

0x808042 

} 

SP 0 FSX/DX/CLKX port ctl 

S0_rctrl 

. set 

0x808043 

} 

SP 0 FSR/DR/CLKR port ctl 

S0_xdata 

. set 

0x808048 

r 

SP 0 Data transmit 

S0_rdata 

. set 

0x80804C 

} 

SP 0 Data receive 

TA 

. set 

12 

f 

AIC timing register values 

TB 

. set 

15 

} 


RA 

. set 

12 

f 


RB 

. set 

15 

} 


GIE 

. set 

0x2000 

r 

This bit in ST turns on interrupts 

} 

} Define 

some constant storage data 


} 

A_REG 


. word 

(TA<<9)+(RA<<2)+0 ; A registers 

B_REG 


. word 

(TB<<9)+(RB<<2)+2 ; B registers 

C_REG 


. word 

10000011b 

; control 

1 

1—1 

u 

tn 

1 

o 

CO 

1—1 
rd 
> 

. word 

0x0E970300 

; Serial port control register 





; values 

S0_xctrl_ 

val 

. word 

0x00000111 

f 

S0_rctrl_ 

val 

. word 

0x00000111 

} 

RAMP 


. word 

0 

; RAMP count value 

ADC_last 


. word 

0 

; Last received ADC value 
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Example 8-6. Setting the TA and TB Registers (Continued) 


r 


; Begin main code loop here 




. -kik-kik-k^k-kik-k^k-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik-kik 

r 


main or 

GIE,ST 

r 

Turn on INTS 


Idi 

0x3,IE 

; Enable XINT/RINT 


call 

INIT 




b 

main 

} 

Do it again! 


r 

DAC2 push 

ST 

} 

DAC Interrupt service routine 

push 

R3 

} 



.if 

RAMPEN 

r 

If RAMPEN=1 assemble this 

code 

idi 

@RAMP,R3 

} 



add! 

256,R3 

r 

Add a value to RAMP 


sti 

R3,@RAMP 

} 



. else 


r 

Else assemble this 


idi 

@ADC_last,R3 

} 



.endif 

r 



andn 

3,R3 

} 



sti 

R3, @S0_xdata 

r 

Output the new DAC value 


pop 

R3 

} 



pop 

ST 

r 



ret 1 


r 



} 

ADC2 push 

ST 

r 



push 

R3 

} 



idi 

@S0_rdata,R3 

r 



sti 

R3,@ADC_last 

} 



pop 

R3 

f 



pop 

ST 

} 



ret i 


f 



. -k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k . 

r r 


; The startup stub is used during 

initialization only ; 


; and can be safely overwritten by the stack or data ; 


• ■kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'kk'k • 

r r 


.entry ST_STUB 

f 

Debugger starts here 


INIT Idp 

T0_ctrl 

r 

Use kernel data page and 

stack 

idi 

0,R0 

} 

Halt TIMO & TIMl 


sti 

RO, @T0_ctrl 

} 



sti 

RO, @T0_count 

r 

Set counts to 0 


idi 

1,R0 

} 

Set periods to 1 


sti 

RO,@T0_prd 

f 



idi 

0x2Cl,RO 

r 

Restart both timers in pulse mode 

sti 

RO,@T0_ctrl 

r 



} 

idi 

@S0_xctrl_val, 

RO; 



sti 

RO,@S0_xctrl 

} 

transmit control 


idi 

@S0_rctrl_val, 

RO; 



sti 

RO,@S0_rctrl 

f 

receive control 


idi 

0,R0 

f 



sti 

RO,@S0_xdata 

r 

DXR data value 


idi 

@S0_gctrl_val, 

RO; 

Setup serial port 


sti 

RO,@S0_qctrl 

r 

global control 
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Example 8-6. Setting the TA and TB Registers (Continued) 





f r 

; This section of code initializes the AIC ; 


r 

AIC_INIT LDI 

0x10,IE 

r 

; Enable only XINT interrupt 


andn 

0x34,IF 

f 


Idi 

0,R0 

} 


st i 

RO,@S0_xdata 

r 


RPTS 

0x040 

} 


LDI 

2, lOF 

; XF0=0 resets AIC 


rpts 

0x40 

} 


LDI 

6, lOF 

; XF0=1 runs AIC 


r 

Idi 

@C_REG,R0 

; Setup control register 


call 

prog_AIC 

} 


Idi 

Oxfffc ,RO 

; Program the AIC to be real slow 

call 

prog_AIC 

r 


Idi 

Oxfffc12,R0 

} 


call 

prog_AIC 

r 


Idi 

@B_REG, RO 

; Bump up the Fs to final rate 


call 

prog_AIC 

; (smallest divisor should be ^ 

Last) 

Idi 

@A_REG,RO 

} 


call 

prog_AIC 

r 


b 

main 



r 

prog_AIC Idi 

@S0_xdata, Rl 

; Use original DXR data during 

2 ndy 

st i 

Rl,@S0_xdata 

r 


idle 




Idi 

@S0_xdata,Rl 

; Use original DXR data during 

2 ndy 

or 

3,R1 

; Request 2 ndy XMIT 


sti 

Rl,@S0_xdata 

} 


idle 


} 


sti 

RO,@S0_xdata 

; Send register value 


idle 


r 


andn 

3,R1 

} 


sti 

Rl,@S0_xdata 

; Leave with original safe value in DXR 

Idi 

@S0_rdata,R0 

; Fix the receiver underrun by 

reading 

rets 

main 

; the DRR before going to the main loop 

• 'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k • 

r r 


; Install the XINT/RINT ISR handler directly into ; 


; the vector RAM 

location it will 

be used for ; 


f r 


. start 

"SPOVECTS",0x809FC5 


. sect 

"SPOVECTS" 



B 

DAC2 

; XINTO 


B 

ADC2 

; RINTO 
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8.5 TLC320AD58 Interface to the TMS320C3x 

The TLC320AD58C serial interface provides several master and slave modes 
for 16-bit or 18-bit data output. This allows it to be compatible to a wide range 
of DSPs. To interface with the ’C3x 32-bit floating-point DSP, the 18-bit master 
mode “100” was chosen to get an 18-bit resolution result and meet the ’C3x 
serial port requirements. The timing diagram is shown in Figure 8-9. 


Figure 8-9. TLC320AD58C Serial Interface 18-bit Master Mode “100” Timing Diagram 



The frame sync signal (FSYNC) is then used to designate valid data from the 
ADC and is active for one shift clock period. After the falling edge of FSYNC, 
the left channel data is shifted out on the falling edge of SCLK with the MSB 
(D17) first. When the last data bit is shifted out, the output remains low for 
another 14 SCLKs to get a total of 32 SCLK periods each channel. After 32 
SCLKs, LRCLK goes low and the right channel data is then shifted out. FSYNC 
and LRCLK frequency are fixed to the sampling frequency (Fg = MCLK/256 or 
MCLK/384, depending on the status of the CMODE input pin). The conversion 
cycle is synchronized to the rising edge of LRCLK and, therefore, to the falling 
edge of FSYNC. Although data is shifted out in two separate time packets rep¬ 
resenting the left and right channel digital outputs, the analog inputs are 
sampled and converted simultaneously. In the master mode, SCLK, FSYNC, 
and LRCLK are generated internally from MCLK, depending on the status of 
the CMODE input pin, as shown in Table 8—4. 
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Table 8-4. Master-Clock-to-Sample-Rate Conversion 


MCLK 

(MHz) 

CMODE 

SCLK (MHz) 

Sample Rate 
(kHz) 

12.288 

Low 

3.072 

48 

18.432 

High 



11.290 

Low 

2.8224 

44.1 

16.934 

High 



8.129 

Low 

2.048 

32 

12.288 

High 



0.256 

Low 

0.064 

1 

0.384 

High 




The ’C30 uses two bidirectional serial ports; the ’C31 and ’C32 each have one. 
Each serial port controls six port pins for receiving/transmitting data: 
FSR/FSX, CLKR/CLKX, and DR/DX. Figure 8-10 shows the glueless inter¬ 
face to the TLC320AD58C using the SCLK, FSYNC, and DOUT signals. Mode 
“100” is set by pulling the MODE1 and MODE2 pins low and the MODEO pin 
high. The master clock is derived from the ’C3x to make sure all clock signals 
are synchronized. The ’C3x is running at 49.152 MHz and provides the 
required MCLK frequency of 12.288 MHz at the timer 0 output pin in order to 
get a 48-kHz sample rate. CMODE must be pulled low. If other sample rates 
are required, see Table 8-4. 

The TLC320AD58C analog function blocks are initialized together with the 
DSP by a system reset after all supply voltages are stable. The digital function 
blocks are initialized by pulling down DIGPD for several microseconds. After 
the rising edge of DIGPD, the device resumes normal operation. When DIGPD 
is low, the TLC320AD58C digital function blocks are shut down and power con¬ 
sumption is reduced. However, if powerdown mode is not required, this signal 
can be tied to ANAPD. In both cases, refer to the Tl Data Acquisition Circuits 
Data Book tor setup timing requirements. All digital inputs and outputs of the 
’C3x and the TLC320AD58C are 5-V TTL compatible. To reduce ringing and 
overshot, a serial damping resistor (50 Q.) is recommended for the master 
clock signal. 
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Figure 8-10. Interface Between the-TMS320C3x and the TLC320AD58C 
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□OUT 
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XFO 
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DR 
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TLC320AD58 
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The ’C3x can be configured to receive a maximum of 32 bits of data per word. 
But, the TLC320AD58C transmits a totai of 64 bits after the FSYNC puise 
appears. This forces the DSP to read the ieft and right channeis back-to-back. 
To accompiish this, the ’C3x seriai port configuration is toggied between con¬ 
tinuous mode and burst mode. In burst mode, FSYNC indicates the start of a 
new data transfer, in continuous mode, the new data transfer starts immedi- 
ateiy after the iast bit of the previous transfer has been shifted out. Both the 
seriai port and the timer registers are memory mapped. Eight memory- 
mapped registers are provided for each seriai port: 

□ One global control register—defines the serial port configuration 

□ Two control registers—set the function of the CLKX/CLKR and FSX/FSR 
pins 

□ Three receive/transmit timer registers 

□ One data receive register 

□ One data transmit register 
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If the serial port shift clock (CLKR/CLKX) is generated externally, the corre¬ 
sponding timer can be used as a general-purpose timer. See the TMS320C3x 
User’s Guide for more information on the ’C3x serial port. 
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Example 8-7 shows the C code for interfacing a TLC320AD58 to the ’C3x. 
Example 8-8 (page 8-36) shows the header file for the C code of 
Example 8-7. Example 8-9 (page 8-38) shows the interrupt table vector list¬ 
ing. These examples perform the following tasks: 

□ Initialize the TLC320AD58C and the ’C30 serial port 1 to meet the 
TLC320AD58C serial interface timing requirements 

□ Set up the timer 0 period register to generate the required MCLK 
frequency 

On a serial port 1 receive interrupt, which occurs after receiving 32 bits from 
either the left channel or right channel, the program reads from the serial port 
receive register and converts the input signal into a floating-point number with¬ 
in the range of -1.0 and 1.0. It then changes the serial port configuration from 
burst to continuous mode when the right channel has been received, or from 
continuous to burst mode when the left channel has been received. The trans¬ 
mit port is configured as the receive port for connection to the 18-bit 
TMS57014A stereo DAC. Remember that the data has to be written to the data 
transmit register no later than three CLKX cycles before the FSYNC pulse 
occurs (in burst mode) or the next transfers starts (in continuous mode). 

Example 8-7. Interfacing the 18-bit TLC320AD58 to TMS320C3x 


^'k-k'k-k'k'k'k-k'k'k'k'k'k'k'k-k'k'k'k-k'k-k'k-k'k'k'k'k'k-k'k-k'k'k'k-k'k-k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k-k'k-k'k'k'k'k'ki^ 

/* File: AD58. C 


/* interfacing the 18-Bit TLC320AD58 to TMS320C3x 


^'k-k'k'k'k-k'k-k'k-k'k'k'k-k'k'k'k'k'k-k'k-k'k-k'k'k'k'k'k-k'k'k'k'k'k'k'k-k'k-k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k-k'k'k'k'k'k-k'k-k'k-k'k'k'k-k^ 

/^include files */ 


/-k -k ! 


finclude "vectors.h" 


finclude "c3x.h" 


/* global variables 


! -k -k ! 


float Ichannel; 


float r_channel; 


/■k 

■k / 

/* main program 


/-k - 

- -k / 

void main(void) 
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Example 8-7. Interfacing the 18-bit TLC320AD58 to TMS320C3x (Continued) 


asm (" 

Idi 

lOOOh,ST"); 

/* 

clear and enable cache */ 

asm (" 

Idi 

Oh,IE"); 

/* 

clear all interrupt masks*/ 

asm (" 

Idi 

H 

O 

/* 

clear all pending interrupt*/ 

init_t 0 (); 



/* 

Generate AD58 MCLK, if required */ 

init_sl (); 



/* 

Initialize serial port 1 */ 

init_ad58(); 





asm (" 

Idi 

_ERINT1_CPU,IE"); 

/* 

enable serial port 1 receive int */ 

asm (" 

or 

_GIEBIT,ST:); 

/* 

global enable interrupts */ 

while(1); 



/* 

wait on interrupt */ 


! -k - -k ! 

/* Subroutine to initialize Serial Port 1 to communicate with TLC320AD58 

l-k - -k/ 


void init_sl (void) 

{ 

serial_port[1][X_PORT] = X1_M0DE; 
serial_port[1][R_PORT] = R1_M0DE; 
serial_port[1][GLOBRL] = Sl_CONEIG; 


j-k - -kj 

/* Subroutine to initialize Timer 0 to generate TLC320AD58 MCLK */ 

/-k - -k/ 


void init_tO(void) 

{ 

timer[0][GLOBAL] = TO_HOLD; 
timer[0][T_COUNTER] = 0X0; 
timer[0][T_PERIOD] = T0_PERIOD; 
timer[0][GLOBAL] = T0_HOLD; 


j-k - -k ! 

/* Serial Port Receive Interrupt Service Routine */ 

l-k - -k ! 

void c_int08(void) 

{ 


/* reconfigure serial port to receive both channels within one frame sync */ 
if (serial_port[1][GLOBAL] & OxOCOO) 

{ 

read LEET channel and normalize within -1.0..1.0 */ 
l_channel = ((float) (serial_port[1] [R_DATA] >> 14) )/(4.0*65536) ; 

/* switch to burst mode*/ 

serial_port[1][GLOBAL] = serial_port[1][GLOBAL] & 0xEEEEE3EE; 

/* if transmitting to DAG, make sure to write to the transmit register no 
later than 3 SCLK=CLKX cycles before the rising edge of ESYNC */ 

} _ 
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Example 8-7. Interfacing the 18-bit TLC320AD58 to TMS320C3x (Continued) 


else 

{ 

read RIGHT channel and normalize within -1.0..1.0 
r_channel = ((float) (serlal_port[1][R_DATA] >> 14))/4.0*65536 

/* switch to continuous made */ 

serlal_port[11[GLOBAL] = serlal_port[1][GLOBAL] | OxOCOO; 

/* If transmitting to DAG, make sure to write to the transmit register no 
later than 3 SCLK=CLKX cycles before the next transfer */ 

} 

} 


/ ^- -k / 

/* Subroutine to Initialize TLC320AD58 */ 

/ *- -k / 

void lnlt_ad58(void) 


asm (" 

Idl 

OOlOb,lOF"); 

/* 

reset 

XFO, power down AD58 */ 

asm (" 

rpt s 

2500 "); 

/* 

wait 

for 100 usee before */ 

asm (" 

nop 

") ; 

/* 


asserting DlgPwd */ 

asm (" 

Idl 

OllOb,lOF"); 

/* 

AD58 

normal operation */ 
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Example 8-8. C3x.h, Header File Listing 


/ 




k 


FILE: C3X.H 




■k / 





/* 

TMS320C3X CONTROL REGISTER 

SETTINGS TO SETUP INTERFACE WITH 








TLC320AD58 18 BIT MASTER MODE 


■k / 





/-k — 

/ 




k 

/■k — 



■k ! 


/■^ 

Serial Port 1 

Initialization 



/-k — 



k / 


#define Xl_MODE 

0x000000111 

/* FSX/DX/CLKX are serial port pins 

*/ 

#define Rl MODE 

0x000000111 

/'^ FSX/DX/CLKX are serial port pins 

-k / 

#define Sl_CONEIG 

OxOOEBCSCOO 

/* SerialPort Configutration 

*/ 




/* FSX/FSR input 

-k / 




/* FSX/FSR signals active high 

^/ 




/* external CLKX/R 

*/ 




/* CLIM/CLKR active low 

^/ 




/* fixed data rate mode 

*/ 




/* 32-bit data width 

*/ 




j -k tx/RX interrupts are enabled 

*/ 




/* XRESET/RRESET set to 0 

*/ 




/^ (take out of reset) 

-k / 

/-k — 


- k / 



/■^ 

Timer 0 Initialization */ 



/-k — 


k / 



/T TOUT Erequency 

(clock mode) = 

1/[8*CLKIN*TO_PERIOD], if TO_PERIOD period>0 

-k / 







= 

1/[4*CLKINI. if TO_PERIOD period ; 0 * 

/ 

#define TO_PERIOD 

0 /* TOUTO = 12.288 MHz for 49.152 MHz CLKIN * 

/ 

#define TO_HOLD 

0x0301 /* clock mode, 50% duty cycle */ 


#define TO_GO 

0x03Cl 



/-k — 


■—-k/ 




Interrupt Mask 




/-k — 


■—■k/ 



asm ( 

"_ERINT1_CPU 

.set 80h:); 

/* enable serial port 1 receive int 

*/ 

asm ( 

"_GIEBIT) 

.set 2000h" 

’); /* global enable interrupts */ 
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Example 8-8.C3x.h, Header File Listing (Continued) 


/-X - 




X: 

/ 

/* TMS320C3X CONTROL REGISTER 

-k / 

/-x - 

/ 

LOCATIONS 



X 

! -x X ! 

/* Serial Ports */ 

! -x X ! 

/* SERIAL PORT BASE LOCATION */ 
volatile int (^serial_port)[16 = 

(volatile 

int (’^) [16] ; 

1 0x808040; 


/* SERIAL PORT CONTROL REGISTERS 
#define GLOBAL 0 



/■^ GLOBAL CONTROL 

■X / 

#define X_PORT 2 



/* TRANSMIT CONTROL 


fdefine R_PORT 3 



/■^ RECEIVE CONTROL 

-X / 

Rdefine X_DATA 8 



/* TRANSMIT DATA 

■X / 

fdefine R_DATA 12 



/* RECEIVE DATA 


! -x X ! 

/* Timer */ 

/■X -X / 

/■k timer base location */ 





volatile int (^timer)[16] = (volatile int i 

(*) [16]) 0x808020; 


fdefine T_COUNTER 4 
fdefine T_PERIOD 8 






Analog Interface Peripherals and Applications 


8-37 



















TLC320AD58 Interface to the TMS320C3x 


Example 8-9. 

TMS320C3X Interrupt Vector Table Listing 



/-X - 



- -k/ 

Filename: 

vectors.h Defines interrupt vectors > 

and trap vectors 

■k / 

/* 

for C programs 



/* 




Usage: 

finclude vectors.h 


■^/ 





Modifications: If you add interrupt service routines, modify 

■k / 

/-k 

this file to insert the vectors at i 

the proper 

■k / 


location in the vector table. 



/-k 



- -kj 

asm (" 

.global _c_int00 "); 



asm (" 

.global _c_int08 "); 



asm (" 

.sect \"vectors\" "); 



asm("RESET 

.word _c_int00 ; external RESET- 

"); 


asm("INTO 

.word _c_int99 ; external INTO- 

"); 


asmi("INTI 

.word _c_int99 ; external INTl- 

"); 


asm("INT2 

.word _c_int99 ; external INT2- 

") ; 


asm("INT3 

.word _c_int99 ; external INT3- 

"); 


asm("XINTO 

.word _c_int99 ; Serial port 0 XMT 

"); 


asm("RINTO 

.word _c_int99 ; Serial port 0 RCV 

"); 


asm("XINTl 

.word _c_int99 ; Serial port 1 XMT 

"); 


asm("RINTl 

.word _c_int08 ; Serial port 1 RCV 

"); 


asm("TINTO 

.word _c_int99 ; Timer 0 

"); 


asm("TINTl 

.word _c_int99 ; Timer 1 

") ; 


asm("DINT 

.word _c_int99 ; DMA complete 

") ; 


asm (" 

.space 20 ; Reserved space 

"); 


asm("TRAPO 


") ; 


asm (" 

.loop 28 ; TRAPS 0-27 are 

"); 


asm (" 

.word _c_int99 ; undefined traps 

") ; 


asm ( " 

.endloop 

") ; 


asm (" 

.space 4 ; TRAPS 28-31 reserved"); 


/k - 



- -k/ 

/* NOTE: Put 

all interrupt handlers AETER this next 

statement ! 

k / 





/-k 



- */ 

asm ( " 

. text 

"); 


void c_int99() { } /* Spurious interrupt handler 




8-38 


















CS4215 Interface to the TMS320C3x 


8.6 CS4215 Interface to the TMS320C3x 

Figure 8-11 shows how to interface the ’C3x with zero glue logic to Crystal 
Semiconductor’s CS4216 16-bit stereo codec. 


Figure 8-11. TMS320C3x-to-CS4216 Interface 



Example 8-10 through Example 8-16 show the assembly and C language 
codes with their respective header files that program and interface the ’C3x to 
the CS4215. Example 8-10 shows the CS4215 driver interrupt vector table. 
Example 8-11 (page 8-41) shows the ’C3x serial port transmit interrupt 
service routine. Example 8-12 (page 8-44) and Example 8-13 (page 8-46) 
display the C code header files. Example 8-14 (page 8-47) shows the C 
language common driver routines. Example 8-15 (page 8-49) is the C code 
header file for Example 8-16 (page 8-59), which displays the C language 
driver routines for the CS4215. 

These files can be downloaded from Texas Instrument’s BBS or ftp site (file¬ 
name C3x4215.EXE). 
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Example 8-10. vecs. asm 

r 

r 

} 

vecs.asm 

r 

} 

staff 


} 

r 

01-03- 

92 

r 

} 

(C) Texas Instruments Inc., 1992 

r 

} 

Refer to the 

file 'license.txt' included with this 

r 

this package 

for usage and license information. 

r 


* VECS 

.ASM 





* C3x 

- CS4215 DRIVER 

INTERRUPT VECTOR TABLE * 



* 

^ (C) 

1991 TEXAS INSTRUMENTS, HOUSTON 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

* INTERRUPT AND RESET ^ 

VECTORS * 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


.sect "vecs" 

; interrupt and reset vectors 


.ref _c_int00 

; compiler defined C initialization reset 


.ref _c_int06 

; serial port transmit interrupt service routine 


.ref _c_int08 

; serial port transmit interrupt service routine 


.ref _c_int99 

; unexpected interrupt handler 

reset : 

.word _c_int00 


into : 

.word _c_int99 


inti: 

.word _c_int99 


int2 : 

.word _c_int99 


int 3 : 

.word _c_int99 


xint0 : 

.word _c_int99 


rint0 : 

.word _c_int06 


xint1: 

.word _c_int99 


rint1 : 

.word _c_int08 


tintO : 

.word _c_int99 


tint1: 

.word _c_int99 


dint: 

.word _c_int99 
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Example 8-11. CJnt.asm 


* 'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

r 

; c_int.asm 

f 

; Leor Brenman 

; 03-16-92 

} 

; (C) Texas Instruments Inc., 1992 

} 

; Refer to the file 'license.txt' included with this 

; this package for usage and license information. 

. i<i<i<i<i<i<'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k 

r 

* C_INT08(VOID) 

* Hand-coded assembly language interrupt service routine. 

* This serial port transmitt ISR supports the CS4215 zero 

* chip I/F to the C3x serial port 

* This ISR has been hand-coded for speed optimization. 

* 

* Leor Brenman, DSP Applications 

* (C) 1991 TEXAS INSTRUMENTS, HOUSTON 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


.globi _c_int08 


'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

* global variables 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

.global _first_half, _input_xferO, _input_xferl, _buffer_size 
.global _buffer_index, _output_xferO 

.global _output_xferl, _output0, _outputl, _data_control 
.global _buffer_rdy, _input0, _inputl 

* global variables 

. data 

SER_1 .word 808050h ;place in same page as .bss 

;to eliminate push/pop of DP when loading 
;serial port one's base address 

* EUNCTION DEE : _c_int08 


. text 
c_int08: 

PUSH ST 

PUSH RO 

PUSHE RO 

PUSH ARO 
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Example 8-11. CJnt.asm (Continued) 


'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k 

* if this is the first half of the transmission then goto FRST_HALF 

LDI @_first_half,RO 

BNZ FRST_HALF 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

else, this the second half of the transmission 
SCND_HALF: 

i^'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

* load ARO with serial port base address 

* do dummy read of serial port to empty control info from serial port 

LDI @SER_1,AR0 

LDI *+AR0(12),R0 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

get control value and write to serial port while branching to end of ISR 

* and set first_half flag to TRUE 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

LDI @_data_control+l,RO 

BD EIN_S 

STI R0,*+AR0(8) 

LDI 1,R0 

STI RO,@_first_half 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

* This the second half of the transmission 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

ERST_HALE: 

* push remaining registers 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


PUSH R1 

PUSHE R1 

PUSH ARl 

PUSH IRQ 
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Example 8-11. CJnt.asm (Continued) 




* set 

first_half flag to FALSE 


'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k'k-k 



LDI 0,R0 

STI RO,@_first_half 

C_int.asm 

POP 

ARO 


POPF 

RO 


POP 

RO 


POP 

ST 


RETI 
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CS4215 Interface to the TMS320C3x 


Example 8-12. General, h 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

general.h v4.2 */ 

/* Copyright (c) 1991 Texas Instruments Incorporated */ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 


#ifndef _GENERAL 
#define _GENERAL 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/-k COMMON MACRO DEFINTIONS 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

#ifndef OEF 

#define OFF 0x00 

#endif 


#ifndef ON 

#define ON 0x01 

#endif 


#ifndef FALSE 

#define FALSE 0x00 

#endif 


#ifndef TRUE 

#define TRUE 0x01 

#endif 


#ifndef CLEAR 

#define CLEAR 0x00 

#endif 

#ifndef SET 

#define SET 0x01 

#endif 
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Example 8-12. General.h (Continued) 


/* GENERAL C3x MACROS */ 

/i<i,i,i,i,i,i,i<i<i,-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

#ifndef INIT_XE_PINS 

#define INIT_XE_PINS asm(" LDI OOh^IOE") 

#endif 

#ifndef CL_INT_FL_REG 

#define CL_INT_FL_REG asm(" LDI Oh,IF") 

#endif 

#ifndef EN_GLOBAL_INTS 

#define EN_GLOBAL_INTS asm(" OR 2000h,ST") 

#endif 


#ifndef EN_SER_PORT_XMT_INT_0 

#define EN_SER_PORT_XMT_INT_0 asm(" OR 10h,IE") 
#endif 


#ifndef EN_SER_PORT_RCV_INT_0 

#define EN_SER_PORT_RCV_INT_0 asm(" OR 20h,IE") 
#endif 


#ifndef EN_SER_PORT_XMT_INT_l 

#define EN_SER_PORT_XMT_INT_l asm(" OR 40h,IE") 
#endif 


#ifndef EN_SER_PORT_RCV_INT_l 

#define EN_SER_PORT_RCV_INT_l asm(" OR 80h,IE") 
#endif 

#ifndef ENABLE_CACHE 

#define ENABLE_CACHE asm(" OR 800h,ST") 

#endif 


#endif /* #ifndef _GENERAL */ 
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CS4215 Interface to the TMS320C3x 


Example 8-13. Commdrvr.h 


/■k COMMDRVR.H 

/-k 

/* TMS320C3X - COMMOM DRIVER HEADER FILE 
/* :TMS320C3x CODE 

/* Compile and archive into appropriate driver library 




/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

^include <c30_per.h> 


/ 


'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


/-k COMMON STRUCTURES */ 

Ikkkkkkkkkkkkkkkk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kl 

typedef volatile int VI; 
typedef volatile float VF; 
typedef VF * volatile VPVF; 
typedef VI ^ volatile VPVI; 


l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kj 

/* FUNCTION PROTOTYPES */ 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

void c_int99(void); 

void heap_overflow(void); 

void init_c30(void); 

void error_in_real_time(void); 
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Example 8-14. Commdrvr.c 


f-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 

commdrvr.c 

staff 

01-15-92 

(C) Texas Instruments Inc., 1992 

Refer to the file 'license.txt' included with this 
this package for usage and license information. 

-k-k-k-k-k-k-k-k-k-k-k-kk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kk-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kf 

^kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk^ 


/-k COMMDRVR.C 

Ik k! 

/■^ TMS320C3X - COMMOM DRIVER ROUTINES */ 

/* :TMS320C3x CODE */ 

/* Compile and archive into aic.lib */ 

Ik k! 

/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON */ 


Ikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

finclude <commdrvr.h> 

IkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkI 

/* C_INT99(): ERRONEOUS INTERRUPT SERVICE ROUTINE */ 

/* THIS ROUTINE IDLES AETER RECEIVING AN UNEXPECTED INTERRUPT */ 

Ikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

void c_int99(void) 

{ 

for (; ; ) ; 

} 

IkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkI 

/* HEAP_OVERELOW(): NOT ENOUGH MEMORY IN THE HEAP */ 

/* THIS ROUTINE IS AN ERROR HANDLER FOR WHEN MEMORY */ 

/* CANNOT BE ALLOCATED FROM THE HEAP */ 

Ikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

void heap_overflow(void) 

{ 

for (; ; ) ; 

} 

Ikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

/* INIT_C30(): INITIALIZE TMS320C30 */ 

Ikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

void init_c30(void) 
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Example 8-14. Commdrvr.c (Continued) 


{ 

BUS_ADDR->exp_gcontrol = 0x0; 
BUS_ADDR->prim_gcontrol = 0x0; 
INIT_XF_PINS; 

ENABLE_CACHE; 

} 


/■^ ERROR_IN_REAL_TIME 0 : ERROR HANDLER, PROCESSING TIME IS GREATER */ 

/* I/O TIME. */ 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

void error_in_reaI_time(void) 


for (;;) ; 

} 
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Example 8-15. CS4215. h 


/i<i,i<i,-ki,i<-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* CS4215.H 





-k / 

/* TMS320C3X - CRYSTAL 

4215 MM CODEC 


/* :TMS320C3x CODE 


-k / 




Leor Brenman, DSP Applications 


/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON 



^include <math.h> 



^include <stdlib.h> 



^include <c30_per.h> 



^include <commdrvr.h> 



/-k -. 



/* MACROS * 


=-*/ 

/k -. 



#define BL0CK_S1ZE 

64 


#define SER_NUM 

SERlAL_PORT_ONE 


#define T1MER_NUM 

T1MER_0NE 


#define XF_NUM 

1 


fdefine 1N1T_ARRAYS 

init_arrays(buffer_size) 


fdefine WA1T_BUFFERS 

while(!buffer_rdy); 


fdefine RESET_FLAGS 

buffer_rdy = FALSE 


fdefine RESET_CODEC 

T1MER_ADDR(T1MER_NUM)->gcontrol = 1_0 | HLD_ 


fdefine UN_RESET_CODEC 

T1MER_ADDR(T1MER_NUM)->gcontrol = 1_0 | HLD_ 

1 DATOUT 

#if XF_NUM 



fdefine DCB_LOW 

asm(" AND 2fh,10F"); asm(" OR 20h,lOF") 


fdefine DCB_H1 

asm(" OR 60h,lOF") 


#else 



fdefine DCB_LOW 

asm(" AND OF2h,10F"); asm(" OR 2h,10F") 


fdefine DCB_H1 

asm(" OR 6h,10F") 


fendif 



fdefine WAIT(A) 

for(1=0;i<A;i++); 


fdefine C_1SR 

ON 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215. h (Continued) 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* CS4215 DATA COMMAND BIT FIELD DATA STRUCTURES */ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* CONTROL COMMAND */ 

/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

typedef union 

{ 

unsigned int _intvai[2]; 
struct 
{ 


/* Time slot 

4 */ 







unsigned int 

adl 

1; 


Loopback mode 




unsigned int 

enl 

1; 


Enable loopback testing 



■k / 

unsigned int 

d_r5 

6; 


Unused - don't care bits: 

2 - 

7 


/* Time slot 

3 */ 







unsigned int 

xen 

1; 

/* 

Transmitter enable 




unsigned int 

xclk 

1; 

/* 

Transmit clock 




unsigned int 

bsei 

2; 

/* 

Select bit rate 




unsigned int 

mckf 

2; 

/-k 

Clock source select 



-k / 

unsigned int 

d_r4 

2; 

/* 

Unused - don't care bits: 

6 - 

7 


/* Time slot 

2 */ 







unsigned int 

df 

2; 

/■k 

Data format selection 




unsigned int 

St 

1; 


Stereo bit: 0-mono, 1-stereo 



unsigned int 

dfr 

3; 


Data conversion freq selection 


unsigned int 

d_r3 

2; 

/* 

Unused - don't care bits: 

6 - 

7 


/* Time slot 

1 */ 







unsigned int 

d_rl 

2; 

/■^ 

Unused - don't cares bits: 

0 - 

• 1 


unsigned int 

deb 

1; 


Data control handshake bit 



unsigned int 

d_r2 

5; 


Unused - don't cares bits: 

3 - 

• 7 


/* Time slot 

8 */ 







unsigned int 

d_r9 : 

:8; 

/■k 

Unused - don't care bits: 

0 - 

7 


/* Time slot 

7 */ 







unsigned int 

rv : 

:4; 


Revision level of the CS4215 



unsigned int 

d_r8 : 

:4; 

/k 

Unused - don't care bits: 

4 - 

7 


/* Time slot 

6 */ 







unsigned int 

d_r7 : 

:8; 


Unused - don't care bits: 

0 - 

7 


/* Time slot 

5 */ 







unsigned int 

d_r 6 : 

: 6 ; 

/* 

Unused - don't care bits: 

0 - 

5 


unsigned int 

pio : 

:2; 

/-k 

Parallel port control 





} _bitvai; 
} CONTROL; 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215.h (Continued) 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/■k data commands */ 

l-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kkl 

typedef union 

{ 

unsigned int _intval[2]; 
struct 
{ 

Time slots 3 & 4 */ 


signed int 

right 

16; 


Right channel 16 bit 

-k / 

Time slots 

; 1 & 2 */ 




signed int 

left : 

:16; 


Left channel 16 bit 


/* Time slot 

8 */ 





unsigned int 

rg : 

:4; 


Right input gain settings 


unsigned int 

ma : 

:4; 

/* 

Monitor path selection 

■k / 

Time slot 

7 





unsigned int 

ig 

4; 


Left input gain settings 

V 

unsigned int 

is 

1; 


Input selection 

*/ 

unsigned int 

ovr 

1; 


Overange 


unsigned int 

pio 

2; 

/* 

Parallel I/O bits 

-k / 

/* Time slot 

6 





unsigned int 

ro 

6; 


Right output attenuation setting 

V 

unsigned int 

se 

1; 

/-k 

Speaker output enable control 

-k / 

unsigned int 

d_rl 

1; 


Unused - don't care bit 7 


/* Time slot 

5 





unsigned int 

lo 

6; 


Left output attenuation setting 


unsigned int 

le 

1; 

/* 

Parallel output enable control 

*/ 

unsigned int 

he 

1; 


Headphone output enable control 


} _bitval; 






} STERE0_16; 







typedef union 
{ 


unsigned int _intval 
struct 

r 

[2] ; 






i 

/* Time slots 3 & 

4 */ 






signed int d_rl 

KD 
\—1 


Unused - don't care 

bits 0 - 

15 


/* Time slots 1 & 

2 */ 






signed int left 

KD 
\—1 


Left channel 16 bit 




/* Time slot 8 */ 
unsigned int d_r3 

:4; 

/■k 

Unused - don't care 

bits: 0 

- 3 

■k / 

unsigned int ma 

:4; 


Monitor path selection 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215. h (Continued) 


/* Time slot 
unsigned int 

7 */ 

ig 

4; 


Left input gain settings 

*/ 

unsigned int 

is 

1; 

/-k 

Input selection 

-k / 

unsigned int 

ovr 

1; 


Overange 

*/ 

unsigned int 

pio 

2; 

/-k 

Parallel I/O bits 


/* Time slot 
unsigned int 

6 */ 

ro 

6; 

/* 

Right output attenuation setting 

^/ 

unsigned int 

se 

1; 


Speaker output enable control 


unsigned int 

d_r2 

1; 

/-k 

Unused - don't care bit 7 


/* Time slot 
unsigned int 

5 */ 
lo 

6; 

/* 

Left output attenuation setting 

*/ 

unsigned int 

le 

1; 


Parallel output enable control 

■k / 

unsigned int 

he 

1; 


Headphone output enable control 

*/ 

} _bitval; 

} M0N0_16; 

typedef union 
{ 

unsigned int _intval[2]; 
struct 

{ 

/* Time slots 4 */ 
signed int d_r2 : 

:8; 

/* 

Unused - don't care bits 0-7 

^/ 

/'^ Time slot 
signed int 

3 */ 

right : 

:8; 

/* 

Right channel 8 bit 

*/ 

/* Time slot; 
signed int 

3 2 */ 

d_rl : 

:8; 

/* 

Unused - don't care bits 0-7 


/* Time slot 
signed int 

1 */ 

left : 

:8; 

/* 

Left channel 8 bit 

^/ 

/* Time slot 
unsigned int 

8 */ 

rg : 

:4; 

/■^ 

Right input gain settings 

*/ 

unsigned int 

ma : 

:4; 


Monitor path selection 

^/ 

/* Time slot 
unsigned int 

7 */ 

ig 

4; 


Left input gain settings 

*/ 

unsigned int 

is 

1; 

/-k 

Input selection 

-k / 

unsigned int 

ovr 

1; 


Overange 

*/ 

unsigned int 

pio 

2; 

/-k 

Parallel I/O bits 

*/ 

/* Time slot 
unsigned int 

6 */ 

ro 

6; 

/* 

Right output attenuation setting 

^/ 

unsigned int 

se 

1; 


Speaker output enable control 

^/ 

unsigned int 

d_r3 

1; 


Unused - don't care bit 7 

*/ 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215.h (Continued) 


Time slot 

5 */ 



unsigned int 

lo 

: 6; 


unsigned int 

le 



unsigned int 

he 

:1; 



} _bitval; 


} STERE0_8; 

typedef union 

{ 

unsigned int _intval[2]; 
struct 
{ 

/* Time slots 2 - 4 */ 


signed int d_rl :24; /* 

/* Time slot 1 */ 

signed int left :8; /* 

Time slot 8 */ 

unsigned int d_r3 :4; /* 

unsigned int ma :4; /* 

/* Time slot 7 

unsigned int Ig :4; /* 

unsigned int is :1; /* 

unsigned int ovr :1; /* 

unsigned int pio :2; 

/* Time slot 6 */ 

unsigned int ro :6; /* 

unsigned int se :1; /* 

unsigned int d_r2 :1; /* 

/* Time slot 5 */ 

unsigned int lo :6; /* 

unsigned int le :1; /* 

unsigned int he :1; /* 

} _bitval; 

} M0N0_8; 


typedef union 


Left output attenuation setting */ 
Parallel output enable control */ 
Headphone output enable control */ 


Unused - don't care bits 0-23 */ 

Left channel 8 bit */ 

Unused - don't care bits: 0-3 */ 
Monitor path selection 

Left input gain settings */ 
Input selection */ 
Overange */ 
Parallel I/O bits 

Right output attenuation setting */ 
Speaker output enable control */ 
Unused - don't care bit 7 

Left output attenuation setting */ 
Parallel output enable control */ 
Headphone output enable control */ 


unsigned int 

CONTROL 

STEREO_16 

MONO_l6 

STEREO_8 

MONO_8 

} CS4215_WORD; 


_intval[2]; 
control; 
stereo_l6; 
mono_l6; 
stereo_8; 
mono_8; 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215.h (Continued) 


7 *========================================================================’^/ 

/* GLOBAL VARIABLES *=====================================================*/ 


ern 

int 

butfer_size; 


SIZE OF I/O BUFFER(S) 

V 

ern 

VPVF 

output0; 


OUTPUT DATA BUFFER FOR PROCESSOR 

k/ 

ern 

VPVF 

input0; 


INPUT DATA BUFFER FOR PROCESSOR 

k / 

ern 

VPVF 

output_xferO; 

/* 

OUTPUT DATA BUFFER FOR ISR/AIC 


ern 

VPVF 

input_xferO; 


INPUT DATA BUFFER FOR ISR/AIC 

^/ 

ern 

VPVF 

output1; 


OUTPUT DATA BUFFER FOR PROCESSOR 

*/ 

ern 

VPVF 

input1; 


INPUT DATA BUFFER FOR PROCESSOR 

k / 

ern 

VPVF 

output_xferl; 


OUTPUT DATA BUFFER FOR ISR/AIC 


ern 

VPVF 

input_xferl; 

/k 

INPUT DATA BUFFER FOR ISR/AIC 

k / 

ern 

VI 

butfer_rdy; 


CPU-ISR COMM FLAG (INPUT) 

^/ 

ern 

VI 

buffer_index; 


INDEX INTO INPUT AND OUTPUT DATA ARRAYS 

^/ 

ern 

VI 

i; 


GENERIC COUNTER VARIABLE 

^/ 

ern 

VI 

first_haif; 




ern 

CS4215_WORD data. 

_cont. 

rol; 



/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* FUNCTION PROTOTYPES */ 

^k-kk-kk-kk-kk-kk-kk-kk-kk-k-k-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k^ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/* CS4215 DRIVER FUNCTIONS */ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

void init_arrays(int buffer_size); 

void init_4215(int crystal, int sampie_rate); 

#if SER_NUM 

void c_int07(void); 

#eise 

void c_int05(void); 

#endif 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/* CS4215 DATA COMMAND BIT FIELD MACROS */ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

/* CONTROL COMMAND MACROS */ 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

#define DATA 1 

#define COMM 0 

#define SIXTEEN_BIT_LINEAR 0 
#define EIGHT_BIT_U_LAW 1 

#define EIGHT_BIT_A_LAW 2 

#define MONO_MODE 0 

#define STEREO_MODE 1 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215.h (Continued) 


/* Data 

conversion Frequency 

Selections Assumes that XTALl 

= 24.576 MHz 

■k / 

/* And XTAL2 = 16.9344 MHz 






/-k 




XTALl (kHz) 

1 XTAL2 (kHz) 









#define 

CONV_FREQ_0 

0 


8.00000 

1 5.5125 


#define 

C0NV_FREQ_1 

1 

/■k 

16.00000 

1 11.0250 

-k / 

#define 

C0NV_FREQ_2 

2 

/-k 

27.42857 

1 18.9000 

-k / 

#define 

C0NV_FREQ_3 

3 

/-k 

32.00000 

1 22.0500 

-k / 

#define 

C0NV_EREQ_4 

4 

/* 

NA 

1 37.8000 


#define 

C0NV_FREQ_5 

5 


NA 

1 44.1000 


#define 

C0NV_EREQ_6 

6 


48.00000 

1 33.0750 


#define 

C0NV_FREQ_7 

7 

/-k 

9.60000 

1 6.6150 

■k / 

#define 

CS_ENABLE 

0 


/* Data 

output enabled 


#define 

CS_DISABLE 

1 


/* Data 

output disabled 


#define 

CS_TCLOCK_EXT 

0 


/* ESYNC and 

SCLK are inputs 


#define 

CS_TCLOCK_INT 

1 


/* ESYNC and 

SCLK are outputs 


#define 

BPE_64 

0 


/* 64 

bits per frame 

*/ 

#define 

BPE_128 

1 


/* 128 

bits per frame 


#define 

BPE_256 

2 


256 

bits per frame 


#define 

CS_CLOCK_SCLK 

0 


/* Clock source select: SCLK 

■k / 

#define 

CS_CL0CK_XTAL1 

1 


/* Clock source select: XTALl 

*/ 

#define 

CS_CL0CK_XTAL2 

2 


/'^ Clock source select: XTAL2 

-k / 

#define 

CS_CLOCK_EXT 

3 


/* Clock source select: Ext 


#define 

DIGITAL_LOOPBACK 

0 





#define 

ANALOG_LOOPBACK 

1 





#define 

LOOP_ENABLE 

1 





#define 

LOOP_DISABLE 

0 





l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

/-k data command macros 





V 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-ki 

/* Output attenuation is 1 

.5 

dB per unit 

integer value 



/-k 




Attenuation (dB) 


-k / 








#define 

ATT_0 

0 

/■k 

0.0 


-k / 

#define 

ATT_1 

1 


1.5 


*/ 

#define 

ATT_2 

2 

/■k 

3.0 


-k / 

#define 

ATT_3 

3 

/* 

4.5 


*/ 

#define 

ATT_4 

4 

/■k 

6.0 


-k / 

#define 

ATT_5 

5 

/-k 

7.5 


-k / 

#define 

ATT_6 

6 

/* 

9.0 



#define 

ATT_7 

7 

/* 

10.5 



#define 

ATT_8 

8 

/* 

12.0 



#define 

ATT_9 

9 

/* 

13.5 


-k / 

#define 

ATT_10 

10 


15.0 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215. h (Continued) 


#define ATT_11 

11 

/■k 

16.5 


#define ATT_12 

12 


18.0 


#define ATT_13 

13 


19.5 


#define ATT_14 

14 


21.0 


#define ATT_15 

15 


22.5 


#define ATT_16 

16 


24.0 


#define ATT_17 

17 


25.5 


#define ATT_18 

18 


27.0 


#define ATT_19 

19 


28.5 


#define ATT_20 

20 


30.0 


#define ATT_21 

21 


31.5 


#define ATT_22 

22 


33.0 


#define ATT_23 

23 


34.5 


#define ATT_24 

24 


36.0 


#define ATT_25 

25 


37.5 


#define ATT_26 

26 


39.0 


#define ATT_27 

27 


40.5 

*/ 

#define ATT_28 

28 


42.0 


#define ATT_29 

29 


43.5 


#define ATT_30 

30 


45.0 


#define ATT_31 

31 


46.5 


#define ATT_32 

32 


48.0 


#define ATT_33 

33 


49.5 


#define ATT_34 

34 


51.0 


#define ATT_35 

35 


52.5 


#define ATT_36 

36 


54.0 


#define ATT_37 

37 


55.5 


#define ATT_38 

38 


57.0 


#define ATT_39 

39 


58.5 


#define ATT_40 

40 


60.0 


#define ATT_41 

41 


61.5 


#define ATT_42 

42 


63.0 


#define ATT_43 

43 


64.5 


#define ATT_44 

44 


66.0 


#define ATT_45 

45 


67.5 


#define ATT_46 

46 


69.0 


#define ATT_47 

47 


70.5 


#define ATT_48 

48 


72.0 


#define ATT_49 

49 


73.5 


#define ATT_50 

50 


74.0 


#define ATT_51 

51 


75.5 


#define ATT_52 

52 


77.0 


#define ATT_53 

53 


78.5 


#define ATT_54 

54 


80.0 


#define ATT_55 

55 


81.5 


#define ATT_56 

56 


83.0 


#define ATT_57 

57 


84.5 


#define ATT_58 

58 


87.0 


#define ATT_59 

59 


88.5 


#define ATT_60 

60 


90.0 


#define ATT_61 

61 


91.5 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215.h (Continued) 


#define ATT_62 


62 


93.0 

■k / 

#define ATT_63 


63 


94.5 


#define HEADPHONE_OFF 

0 




#define HEADPHONE_ON 

1 




#define LINE_OUT 

_OFF 

0 




#define LINE_OUT 

_ON 

1 




#define SPEAKER_ 

OFF 

0 




#define SPEAKER_ 

ON 

1 




/* Input gain is 

1.5 dB 

per unit 

integer ^ 

value 


/-k 




Gain (dB) 

-k / 

/■k 






#define GAIN_0 


0 


0.0 

V 

#define GAIN_1 


1 


1.5 


#define GAIN_2 


2 


3.0 

V 

#define GAIN_3 


3 


4.5 


#define GAIN_4 


4 


6.0 


#define GAIN_5 


5 


7.5 


#define GAIN_6 


6 


9.0 

V 

#define GAIN_7 


7 

/* 

10.5 

-k / 

#define GAIN_8 


8 

/* 

12.0 

*/ 

#define GAIN_9 


9 

/* 

13.5 

-k / 

#define GAIN_10 


10 


15.0 


#define GAIN_11 


11 


16.5 

*/ 

#define GAIN_12 


12 

/* 

18.0 


#define GAIN_13 


13 

/* 

19.5 


#define GAIN_14 


14 


21.0 


#define GAIN_15 


15 


22.5 


#define LINE_IN 


0 




#define MIKE_IN 


1 




#define OVERANGE, 

_ENABLE 

1 




#define OVERANGE 

_CLEAR 

0 




Monitor path 

attenuation = 6 

dB per unit integer value 

*/ 

/k 




Gain (dB) 

-k / 

/-k 





-k / 

#define MATT_0 


0 


6.0 


#define MATT_1 


1 


12.0 


#define MATT_2 


2 


18.0 


#define MATT_3 


3 


24.0 

-k / 

#define MATT_4 


4 


30.0 

*/ 

#define MATT_5 


5 


36.0 

-k / 

#define MATT_6 


6 


42.0 

*/ 

#define MATT_7 


7 


48.0 

-k / 

#define MATT_8 


8 


54.0 

*/ 

#define MATT_9 


9 


60.0 

-k / 

#define MATT_10 


10 


66.0 
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CS4215 Interface to the TMS320C3x 


Example 8-15. CS4215. h (Continued) 


#define MATT_11 

11 


72.0 


#define MATT_12 

12 

/-k 

78.0 


#define MATT_13 

13 

/-k 

84.0 


#define MATT_14 

14 

/* 

90.0 


#define MATT_15 

15 

/-k 

96.0 (Mute Monitor Path) 
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CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.C 


f-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 

cs4215.c 

staff 

05-13-92 

(C) Texas Instruments Inc., 1992 

Refer to the file 'license.txt' included with this 
this package for usage and license information. 

-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-kf 

^k-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k^ 

CS4215.C 

f k k ! 

/* TMS320C3X - CRYSTAL 4215 MM CODEC */ 

/* :TMS320C3x CODE */ 

Compile and archive into CS4215.1ib */ 

Ik k! 

/* Leor Brenman, DSP Applications */ 

/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON */ 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

finclude <math.h> 
finclude <stdlib.h> 
finclude <string.h> 
finclude <cs4215.h> 


jkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

/* GLOBAL VARIABLES 

jkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 


kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

k / 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 


int 

buffer_size = BLOCK_SIZE; 


SIZE OE I/O BUEFER(S) 

k/ 

VPVE 

output0; 


OUTPUT DATA BUFFER FOR PROCESSOR 

*/ 

VPVE 

input0; 

/k 

INPUT DATA BUFFER FOR PROCESSOR 

k/ 

VPVE 

output_xferO; 

/* 

OUTPUT DATA BUFFER FOR ISR/CODEC 

k / 

VPVE 

input_xferO; 

/* 

INPUT DATA BUFFER FOR ISR/CEDEC 

k / 

VPVE 

output1; 

/* 

OUTPUT DATA BUFFER FOR PROCESSOR 

^/ 

VPVE 

input1; 

/* 

INPUT DATA BUFFER FOR PROCESSOR 

^/ 

VPVE 

output_xferl; 

/* 

OUTPUT DATA BUFFER FOR ISR/CEDEC 

k / 

VPVE 

input_xferl; 


INPUT DATA BUFFER FOR ISR/CODEC 

*/ 

VI 

buffer_rdy = EALSE; 

/k 

CPU-ISR COMM FLAG (INPUT) 

k / 

VI 

buffer_index = 0; 


INDEX INTO INPUT AND OUTPUT DATA ARRAYS 

*/ 

VI 

first_half = TRUE; 




VI 

i; 


GENERIC COUNTER VARIABLE 

^/ 

C S 4 215_WORD dat a_cont roI; 




#if C_ISR 
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CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.C (Continued) 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* C_INT06() OR C_INT08() */ 

/* SERIAL PORT 0/1 RECEIVE INTERRUPT SERVICE ROUTINE */ 

^-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk^ 

#if SER_NUM 

void c_int06(void) {} 

void c_int08(void) 

#else 

void c_int08(void) {} 
void c_int06(void) 

#endif 

{ 

VPVF swap; 

CS4215_WORD in,out; 

if(first_half) /* First half of the 64 bit transmission 

{ 

first_half = FALSE; 

in._intval[0] = SERIAL_PORT_ADDR(SER_NUM)->r_data; 
input_xferO[buffer_index] = in.stereo_l6._bitval.right; 
input_xferl[buffer_index] = in.stereo_l6._bitval.left; 

out.stereo_l6._bitval.left = output_xferl[buffer_index]; 
out.stereo_l6._bitval.right = output_xferO[buffer_index]; 
SERIAL_PORT_ADDR(SER_NUM)->x_data = out._intval[0]; 

if(++buffer_index == buffer_size) 

{ 


swap 

= input0; 

input0 

= input_xferO; 

input_xferO 

= swap; 

swap 

= input1; 

input1 

= input_xferl; 

input_xferl 

= swap; 

swap 

= output0; 

output0 

= output_xferO; 

output_xferO 

= swap; 

swap 

= output1; 

output1 

= output_xferl; 

output_xferl 

= swap; 

buffer_index 

= 0; 

buffer_rdy 

= TRUE; 
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Example 8-16. CS4215.C (Continued) 


else Second half of transmission */ 

{ 

SER1AL_P0RT_ADDR(SER_NUM)->r_data; 

SER1AL_P0RT_ADDR(SER_NUM)->x_data = data_control._lntval[1]; 
flrst_half = TRUE; 

} 

} 

fendlf C_1SR 


/*========================================================================*/ 

/* 1N1T_ARRAYS0 : INITIALIZE DATA ARRAY PARAMETERS */ 

/*========================================================================*/ 

void lnlt_arrays(Int buffer_slze) 

{ 

Int 1; 

/-k - -k/ 

/* INITIALIZE AND ZERO FILL ARRAYS */ 

- -k / 


lf(!(lnputO = (float *) calloc(buffer_slze,slzeof(float)))) 
heap_overflow() ; 

If ( ! (outputO = (float *) calloc(buffer_slze,slzeof(float)))) 
heap_overflow(); 

If(!(lnput_xferO = (float *) calloc(buffer_slze,sizeof(float)))) 
heap_overflow() ; 

If(!(output_xferO = (float *) calloc(buffer_slze,slzeof(float)))) 
heap_overflow() ; 

lf(!(lnputl = (float *) calloc(buffer_slze,sizeof(float)))) 
heap_overflow() ; 

If(! (outputl = (float *) calloc(buffer_slze,slzeof(float)))) 
heap_overflow() ; 

If(!(lnput_xferl = (float *) calloc(buffer_size,sizeof(float)))) 
heap_overflow() ; 

If(!(output_xferl = (float *) calloc(buffer_slze,slzeof(float)))) 
heap_overflow() ; 

for(l = 0; 1 < buffer_slze; 1++) 

{ 

outputO[1] = output_xferO[1] = 0.0; 
outputl[1] = output_xferl[1] = 0.0; 

} 

} 
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CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.C (Continued) 


/* INIT_4215(): INITIALIZE COMMUNICATIONS TO CS4215 */ 

/* NOTE: i IS A VOLATILE TO FORCE TIME DELAYS AND TO FORCE */ 

/* READS OF SERIAL PORT DATA RECEIVE REGISTER TO CLEAR */ 

/-^ THE RECEIVE INTERRUPT FLAG */ 

void init_4215(int crystal, int sample_rate) 

{ 

VI i,j,dummy; 

CS4215_WORD temp,in,out; 


RESET_CODEC; 
WAIT(50); 


/* RESET AIC */ 
/-^ KEEP RESET LOW FOR SOME PERIOD OF TIME -^ / 


/-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

/* CONFIGURE SERIAL PORT 1 */ 

^k-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-kk-k^ 

SERIAL_PORT_ADDR(SER_NUM)->gcontroI = 0x0; 

SERIAL_PORT_ADDR(SER_NUM)->s_x_controI = CLKXFUNC | DXFUNC | FSXFUNC; 
SERIAL_PORT_ADDR(SER_NUM)->s_r_controI = CLKRFUNC | DRFUNC | FSRFUNC; 

SERIAL_PORT_ADDR(SER_NUM)->s_rxt_controI = XGO | XHLD_ | XCP_ | XCLKSRC; 

/* THE FOLLOWING PERIOD REGISTER VALUE HAS BEEN TESTED ON A 50 MHz C30 */ 

SERIAL_PORT_ADDR(SER_NUM)->s_rxt_period_bit.x_period = 0x3; 

SERIAL_PORT_ADDR(SER_NUM)->gcontroI = XCLKSRCE | XLEN_32 | XFSM | RFSM | 

RLEN_32 I XINT | RINT | 

FSXOUT I RRESET | XRESET; 


/* BUILD CONTROL WORDS */ 

/* ALL BITS ARE 0 EXCEPT THOSE DEFINED OTHERWISE */ 

temp._intval[0] = temp._intval[1] = 0; 
temp.control._bitval.st = STEREO_MODE; 
temp.control._bitval.dfr = sample_rate; 
temp.control._bitval.xclk = 1; 
temp.control._bitval.mckf = crystal; 
temp.control._bitval.pio = 3; 


/* BUILD DATA CONTROL WORD */ 

data_controI._intval[0] = data_controI._intval[1 
data_controI.stereo_l6._bitval.lo = ATT_0; 

data_controI.stereo_l6._bitval.le = ON; 

data_controI.stereo_16._bitvaI.ro = ATT_0; 

data_control.stereo_l6._bitval.ovr = ON; 

data_control.stereo_l6._bitval.ma = MATT_15; 


] = 0 ; 
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Example 8-16. CS4215.C (Continued) 


UN_RESET_CODEC; 


/-^ PULL 4215 OUT OF RESET 


DCB_LOW; 

Write out control word until dob bit is low '^/ 
do 
{ 

out = temp; 
for(i=0;i<5;i++) 

{ 

while(SERIAL_PORT_ADDR(SER_NUM)->gcontrol_bit.xsrempty == 1); 
SERIAL_PORT_ADDR(SER_NUM)->gcontrol = 0x0; 

/* See note on XRESET/RRESET and three cycle delay in C3x U.G. */ 
for (j = 0;j<3;j++); 

SERIAL_PORT_ADDR(SER_NUM)->gcontrol = XCLKSRCE | XLEN_32 | XFSM | 

RFSM I RLEN_32 | XINT | RINT | 
FSXOUT I RRESET | XRESET; 

dummy = SERIAL_PORT_ADDR ( SER_NUM) ->r_data; 

SERIAL_PORT_ADDR(SER_NUM) ->x_data = out._intval[0]; 

/* See note on XRDY and three cycle delay in C3x U.G. */ 
for(j=0;j<3;j++); 

while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xrdy == 0); 
SERIAL_PORT_ADDR(SER_NUM) ->x_data = out._intval[1]; 


while(SERIAL_PORT_ADDR(SER_NUM)->gcontrol_bit.rrdy == 0); 

in._intval[0] = SERIAL_PORT_ADDR(SER_NUM)->r_data; 

/* See note on RRDY and three cycle delay in C3x U.G. */ 
for ( j = 0;j<3;j++) ; 

while(SERIAL_PORT_ADDR(SER_NUM)->gcontrol_bit.rrdy == 0); 
in._intval[1] = SERIAL_PORT_ADDR(SER_NUM)->r_data; 

} 

} while(in.control._bitval.deb != 0); 


Analog Interface Peripherals and Applications 


8-63 













CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.C (Continued) 


/-^ Write out control word twice with the dob bit high '^ / 

temp.control._bitval.deb = 1; 

out = temp; 

for (i = 0;i<2;i++) 

{ 

while(SERIAL_PORT_ADDR(SER_NUM)->gcontrol_bit.xsrempty == 1); 

SERIAL_PORT_ADDR(SER_NUM)->gcontrol = 0x0; 

/'^ See note on XRESET/RRESET and three cycle delay in C3x U.G. '^ / 

for ( j = 0;j<3;j++) ; 

SER1AL_P0RT_ADDR(SER_NUM)->gcontrol = XCLKSRCE | XLEN_32 | XFSM | 

RESM I RLEN_32 | XINT | RINT | 
FSXOUT I RRESET | XRESET; 

dummy = SER1AL_P0RT_ADDR(SER_NUM)->r_data; 

SER1AL_P0RT_ADDR(SER_NUM)->x_data = out._intval[0]; 

/* See note on XRDY and three cycle delay in C3x U.G. */ 

for (j = 0;j<3;j++); 

while(SERlAL_PORT_ADDR(SER_NUM)->gcontrol_bit.xrdy == 0); 

SER1AL_P0RT_ADDR(SER_NUM)->x_data = out._intval[1]; 


while(SER1AL_P0RT_ADDR(SER_NUM)->gcontrol_bit.rrdy == 0); 

in._intval[0] = SER1AL_P0RT_ADDR(SER_NUM)->r_data; 

/* See note on RRDY and three cycle delay in C3x U.G. '^/ 
for(j=0;j<3;j++); 

while(SERlAL_PORT_ADDR(SER_NUM)->gcontrol_bit.rrdy == 0); 
in._intval[1] = SER1AL_P0RT_ADDR(SER_NUM)->r_data; 

} 

SER1AL_P0RT_ADDR(SER_NUM)->gcontrol = 0x0; 

SER1AL_P0RT_ADDR(SER_NUM)->gcontrol = XLEN_32 | RLEN_32 | XFSM | RESM | 

RRESET I XRESET | XCLKSRCE; 
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Example 8-16. CS4215.C (Continued) 


while{SERIAL_PORT_ADDR(SER_NUM)->gcontrol_bit.xrdy == 0); 
SERIAL_PORT_ADDR(SER_NUM)->x_data = 0; 

/* See note on XRDY and three cycle delay In C3x U.G. */ 
for(j=0;j<3;j++); 

while(SER1AL_P0RT_ADDR(SER_NUM)->gcontrol_blt.xrdy == 0); 

SER1AL_P0RT_ADDR(SER_NUM)->x_data = data_control._lntval[1]; 

dummy = SER1AL_P0RT_ADDR(SER_NUM)->r_data; 

SER1AL_P0RT_ADDR(SER_NUM)->gcontrol |= XINT | RINT; 

SER1AL_P0RT_ADDR(SER_NUM)->gcontrol -XCLKSRCE; 

SER1AL_P0RT_ADDR(SER_NUM)->s_rxt_control = 0; 

CL_1NT_FL_REG; 

#lf SER_NUM 

EN_SER_P0RT_RCV_1NT_1; 

#else 

EN_SER_PORT_RCV_1NT_0; 
fendlf 

EN_GL0BAL_1NTS; 

DCB_H1; 

} 
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Software UART Emulator for the TMS320C3x 


8.7 Software UART Emulator for the TMS320C3x 

By using the general-purpose I/O pins in conjunction with two timers and an 
external interrupt, you can develop a very flexible full-duplex universal asyn¬ 
chronous receive transmit (UART) emulator in software. This solution dis¬ 
cusses the implementation of an interrupt-driven, 9 600-baud UART with eight 
data bits, one stop bit, and no parity. This solution was contributed by Ted Fried 
of Advanced Computer Communications. 


8.7.1 Hardware 


The hardware interface is relatively straightforward (see Figure 8-12). The re¬ 
ceive line is connected to both the INTO and IOF1 pins. This triggers an inter¬ 
rupt on the falling edge of the start bit. The transmit line is connected to the 
lOFO pin and a pullup resistor. 


8.7.2 Software 


As shown in Example 8-17, the receive sequence begins when the start bit 
triggers the external interrupt. At the interrupt service routine, RxINTO, timerO 
is loaded with a value that results in a delay of one half of the bit time. The rou¬ 
tine then loads the timer’s interrupt vector, enables it, then exits to the main 
program. When the timer triggers its interrupt, Rx-TMR-INT, the main body of 
the receive code executes. At this time, the line is in the middle of the start bit. 
The CPU then samples IOF1 and verifies that the start bit has been read in. 
If the start bit is verified, the timer is then loaded with the full-bit time and 
started. The procedure then exits to the main program. 

On successive timerO interrupts, RxINTO, the received bits are shifted into a 
storage area in memory until a byte is read in. On the ninth interrupt, if the stop 
bit is verified, the routine executes a software trap to inform the main program 
of the byte reception. If the stop bit is not verified, the BAD_STOP_BIT subrou¬ 
tine is called where the appropriate action is taken. After the received byte is 
processed, the external interrupt is then reenabled and the system waits for 
the next start bit. 

The transmit routine begins when the main program loads a byte into the hold¬ 
ing register and then calls TX_MAIN. This procedure loads timerl with the full- 
bit time value, resets the transmit counter, sets the start bit, and enables the 
timer’s interrupt. The routine then exits back to the main program. The main 
program does not call for another byte transmit until it finds the transmit count¬ 
er equal to 0. On each subsequent timerl interrupt, Tx-INT, the routine shifts 
out the transmit byte including the stop bit, until the transmit counter is 0. 
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Example 8-17. Full Duplex UART Emulator for TMS320C3x 


half_bit_t ime 

set OlADh 

r 

assume 33-MHz TMS320C3x 

whole_bit_time 

set 0358h 




timer_go 

set OSClh 




timer_setup 

set 0?Dlh 




int_setup 

sec OSOlh 




iof_setup 

set 06h 




timerO_vector 

.word RX_TMR_INT 

} 

interrupt vector addresses 

timerl_vector 

.word TX_TNT 




rx_int_vector 

.word RX_INTO 




timerO_period 

.word 0808028h 

} 

on-chip RAM locations 

timerl_period 

.word 0808038h 




timerO_control 

.word 0808020h 




timerl_control 

.word 0808030h 




timerO_int_vect 

.word 0809EC9h 




timerl_int_vect 

.word 0809ECAh 




intO_vector 

.word 0809EClh 




rx_byte 

.word 0809FE8h 




tx_byte 

.word 0809FE9h 




rx_counter 

.word 0809FFAh 




tx_counter 

.word 0809FFBh 




; Main setup for asynchronous serial 

interface to be run at 


powerup. 





SETUP_ASYNCH: 

PUSH AR7 




OR iof_ 

setup, lOF 

r 

iof seetup and iofO= 

= 1 

LDI timer_setup, AR7 

r 

setup timerO and timerl 

STI AR7, 

@timerO_control 

r 



STI AR7, 

@timerl_control 

} 



LDI rx_int_vector, AR7 

r 

load into interrupt 

vector 

STI AR7, 

@int0_vector 

} 



OR int_ 

setup, IE 

r 

enable interrupts 


POP AR7 

RETS 





; Start bit received, external interrupt service routine 


RX_INT0: PUSH 

AR7 




XOR Olh, 

le 

r 

disable intO 


LDI half 

_bit_time, AR7 

r 



STI AR7, 

@timerO_period 

r 

rx_timer period 


LDI timerO_vector, AR7 

r 



STI AR7, 

@timerO_int_vect 

r 

rx_timer int vector 


LDI timer_go, AR7 

} 



STI AR7, 

@timerO_control 

r 

start rx_timer 


LDI OAh, 

AR7 

r 



STI AR7, 

POP AR7 

RETI 

@rx_counter 

} 

reset rx_counter 
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Example 8-17. Full Duplex UART Emulator for TMS320C3x (Continued) 


; TimerO interrupt service routine for 

byte reception. 

RX_TMR_ 

_INT: 

PUSH AR7 




LDI 

@rx_counter, AR7 




CMP I 

09h, AR7 

} 

are we at start bit? 


BNE 

STOP 

} 

nope, check for stop bit 


CMP I 

080h, lOE 

} 

check rx_bit (lOFl) 


BLT 

OK 

r 

if less than 80h (IOE1=0)? 


OR 

Olh, IE 

r 

bad start bit, reenable 

INTO 






BR 

CLEANUPS 

r 

go back to main 

OK: 

SUBI 

Olh, AR7 

} 

decrement rx_counter 


STI 

AR7, @rx_counter 

r 

update counter in memory 


LDI 

whoIe_bit_time, AR7 

} 



STI 

AR7 @timerO_period 

r 

load bit time into rx_timer 


LDI 

timer_go, AR7 

} 



STI 

AR7, @timer0_crontrol 

r 

start rx_timer 


POP 

AR7 




RETI 




STOP : 

PUSH 

AR6 




LDI 

@rx_byte, AR6 




DBNZ 

AR7, NEXT 

r 

if rx_count !=0, get next bit 


CMP I 

080h, lOE 

} 

check rx_bit (lOEl) 


BLT 

BAD_STOP_BIT 

r 

GO TO INVALID STOP BIT MODULE 


LSH 

-24, AR6 

} 

shift rx_byte 24 bits right 


STI 

AR6, @rx_byte 

r 

TRAP RECEIVED BYTE!! 


OR 

Olh, IE 

} 

reenable INT0\ 


BR 

CLEANUP 

r 


NEXT: 

CMP I 

080h. lOE 

} 

check rx_bit (lOEl) 


OR 

Olh, ST 

r 

force carry flag to 1 


BGE 

ONE 

} 

if rx_bit = 1 


XOR 

Olh, ST 

r 

set carry flag to 0 

ONE : 

RORC 

AR6 

} 

shift in carry bit 


STI 

AR6, @rx_byte 

} 

update rx_byte in memory 


STI 

AR7, @rx_counter 

r 

update counter in memory 


LDI 

timer_go, AR6 

} 



STI 

AR6, @timer0_control 

} 

start rx_timer 

CLEANUP: 

POP AR6 



CLEANUP2: 

POP AR7 RETI 
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Example 8-17. Full Duplex UART Emulator for TMS320C3x (Continued) 


; Transmit byte main subroutine 


TX_MAIN: 

PUSH 

AR7 


LDI 

whoie_bit_time, AR7 


STI 

AR7 . 

@timerl_period 

; load timer period 

LDI 

timerl_vector, AR7 

} 

STI 

AR7, 

@timerl_int_vect 

; tx_timer int vector 

LDI 

@tx_ 

byte, AR7 

} 

OR 

OFEOOh, AR7 

} mask stop bit to tx_byte 

STI 

AR7, 

@tx_byte 

; update tx_byte 

AND 

OEBh 

, lOE 

; send out '0' to lOFO 

LDI 

OAh, 

AR7 

} 

STI 

AR7, 

@tx_counter 

; load counter in memory 

LDI 

timer_go, AR7 

} 

STI 

AR7, 

@timerl_controI 

; start tx_timer 


POP 

AR7 



RETS 



; Timerl interrupt service routine 

for byte transmission. 

TX_INT: 

PUSH AR7 



LDI 

@tx_counter, AR7 

; load in tx_counter from mem 


DBNZ 

AR7, NEXT_OUT 

; if tx_counter not zero 


POP 

AR7 



RETI 



NEXT 

OUT: 

PUSH AR6 



LDI 

timer_go, AR7 



STI 

AR7, @timerl_controI 

; start tx_timer 


LDI 

tx_byte, AR6 

; load in tx_byte from mem 


RORC 

AR6 

; next bit out is in carry 


BNC 

OUT ZERO 

; carry=0. then send out '0' 


OR 

04h, lOF 

; send out ' 1 ' to lOFO 


BR 

CLEANUPS 

r 

OUT , 

ZERO: 

AND OEBh, lOF 

; send out '0' to lOFO 

CLEANUPS: 

STI AR6, @tx_byte 

; update byte in memory 


STI 

AR7, @tx_counter 

; update counter in memory 


POP 

AR6 



POP 

AR7 



RETI 
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8.8 Hardware UART for TMS320C3x 

Section 8.7 discusses a software UART emulator, which allows the ’C3x to per¬ 
form asynchronous communication. There are some applications that require 
a hardware UART. This section describes one possible design for a hardware 
UART (see Figure 8-12). This design, originally done in a field programmable 
gate array (FPGA), can be easily transferred to an application specific inte¬ 
grated circuit (ASIC). You can modify this design to accommodate faster data 
rates or different communication protocols. 

Figure 8-12. TMS320C3x Serial Port to UART Interface 
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Hardware UART for TMS320C3x 


Figure 8-13 shows a 9,600-baud UART with one stop bit and one start bit. The 
clock signal, H3, is supplied to the circuit from the ’C3x. The DSP uses a 
25-MHz clock. 


Figure 8-13. Transmit Circuitry 


-Q 


CLKXO — 

D Q 

H3 - 

> 


H3 -> 


D 

Q 

> 



XEN 



XEN 
Stop bit 


Modulus 8 binary counter 



The ’C3x serial port transmit circuitry, shown in Figure 8-13, is configured to 
output eight bits of data at a rate of approximately 9.6 kHz. This is achieved 
by using one of the ’C30’s internal timers and programming it to the desired 
9.6 kHz frequency. The transmitting port is configured in the first burst mode. 
This allows the leading FSX signals to help initiate a start bit for the UART 
protocols. The stop bit is generated at the end of the eighth bit by the UART 
circuitry. 

The receive circuitry of the UART, shown in Figure 8-14, is activated when the 
circuit detects the start bit. The start bit is a logical 0. The delay circuit is acti¬ 
vated on the falling edge of the start bit. The delay causes sampling of the 
incoming data bits to occur in the middle of each bit, thus, increasing the 
UART’s noise immunity. 
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Hardware UART for TMS320C3x 


Figure 8-14. Receive Circuitry 



After the delay is performed, the timer is activated. The timer has a period of 
104 ps, which corresponds to a baud rate of approximately 9.6 KHz. At each 
bit time, a data value is sampled into an 8-bit shift register. After all eight bits 
are received, the data is passed to the ’C30 over the serial port at 1/8 of the 
H3 clock rate. The FPGA circuitry interfaces the ’C30 in the fixed burst mode 
of operation to the serial port. Both the clock and the frame sync signals are 
generated by the FPGA circuitry. 

This UART circuitry can also easily be designed to function as an ASIC or can 
be incorporated into a custom digital signal processor (cDSP). Modification to 
this circuit can be done for different serial communication protocols or even 
higher baud rates. 
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Chapter 9 


Clock Oscillator and Ceramic Resonators 


This chapter provides a general background on oscillators as well as informa¬ 
tion regarding crystal and ceramic resonators, their frequency characteristics, 
and the type of oscillator circuit used on the ’C3x. Also covered are design as¬ 
pects of the ’C3x oscillator, including appropriate configuration of the external 
components, measured parameters for the on-board portion of the circuitry, 
use of the oscillator with overtone crystals, and general design considerations 
for choosing the external components for the oscillator. Finally, this chapter 
shows some design solutions for common frequencies. 
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9.1 Oscillators 


The ’C3x is a member of the Texas Instruments’ family of high-speed DSPs. 
The ’C3x is capable of performing operations at a rate of up to 30 million 
instructions per second (MIPS). The wide variety of DSP applications requires 
a wide range of clocking frequencies. The ’C3x allows considerable flexibility 
in meeting these clocking requirements. 

The ’C3x provides two modes for clock generation and control for use with dif¬ 
ferent application needs. These include: 

□ External clock input with the capability to divide the clock frequency by 2 

□ Internal clock generation from an on-board oscillator with no external clock 
necessary (’C30 and ’C31 only) 

The built-in oscillator provides a method for accurate clock generation that re¬ 
quires few external components (a crystal or ceramic resonator and two load 
capacitors). This saves board space and reduces system cost. 

On the ’C3x devices, the on-board oscillator operates in a divide-by-2 mode. 
In this mode, the frequency of H1 or H3 (which indicates the actual machine 
cycles of the processor) is one half of the oscillator frequency. 

9.1.1 Recommendations for Oscillator Use 

The ’C3x family of devices provides several clock generation options based 
on cost, component count, and the required clock frequency for the applica¬ 
tion. The oscillator clocking option on the ’C3x provides a low-cost method of 
clock generation with as few as three external components (one crystal and 
two load capacitors), which helps to minimize board space consumed for clock 
generation. The crystal or ceramic resonator used determines the frequency 
of operation. This frequency can extend up to 60 MHz with third-overtone crys¬ 
tals. 

CMOS-compatible integrated-circuit crystal oscillators are available across a 
wide frequency range. These are more expensive than the internal oscillator 
and usually consume more space on the board. CMOS oscillators also be¬ 
come more expensive with higher operating frequency. 
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9.2 Quartz Crystal and Ceramic Resonators 

All oscillators require resonating components to determine the frequency of 
oscillation. A resonating component reacts more strongly within a certain fre¬ 
quency range than at other frequencies outside that range. A simple resonator 
consists of an inductor (L) and a capacitor (C). These components resonate 
or favor the frequency at which their individual reactances cancel each other. 
Figure 9-1 shows a simple series-LC resonator with impedance equations. 

Figure 9-1. Series-LC Schematic 


Lx 

O_ 


Cx 


The impedance equations for the series-LC schematic are as follows: 
Zl = jcoL Zc = 1 /jcoC Zt = Zl H- Zc = j(a)L - 1 /coC) 


Zt is minimum where coL = 1/coC 


so cos^ 


J_ 

LC 


COs 


1 

/LC 


Consider the impedance of the series combination of these components. The 
impedance of the inductor Zl= jcoL, where co is the angular frequency (co = 2jTf), 
and the impedance of the capacitor Zc = l/jcoC. The total impedance of the 
inductor-capacitor combination is Zt = Zl-h Zc = j(coL - l/coC). Therefore, the 
magnitude of the combined impedance of these two components is a minimum 
at the frequency where coL = 1/coC. This frequency (cos) is the resonant fre¬ 
quency and is determined by : 

cOs = 7^ 

ycc 


Although oscillators frequently consist of different combinations of inductors 
and capacitors as resonating elements, the accuracy of the frequency control 
with these components is limited. Changes in the values of L and C due to tol¬ 
erance limitations and changes in the environment (such as temperature) 
strongly affect the frequency of the oscillator. Many applications in digital sys¬ 
tems require precise clock timing and need more accurate resonators. Quartz 
crystal and ceramic resonators can provide a more stable and precise fre¬ 
quency control. 
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9.2.1 Behavior and Operation of Quartz Crystai and Ceramic Resonators 

The oscillator circuitry built Into the ’C3x devices is designed for use with a 
quartz crystal or ceramic resonator as the frequency-controlling element. 

Quartz crystal and ceramic resonators are resonating components made with 
materials that have specific piezoelectric properties. Piezoelectric materials 
deform mechanically in the presence of an electric potential; this mechanical 
stress on the material produces a voltage. This property makes a very stable 
resonator, since the frequency of mechanical vibration is controlled precisely 
by the size, shape, and material properties of the crystal or ceramic used. In 
fact, many quartz crystal resonators are so precise that they operate within 
10 parts per million (ppm) of the intended frequency. 

Ceramic resonators are similar to quartz crystal resonators in physical struc¬ 
ture, but they are made from a polycrystalline ceramic instead of monocrystal¬ 
line quartz. The production process for the ceramic is much less expensive 
than for quartz, reducing the final cost of the resonator. However, the polycrys¬ 
talline structure of the ceramic vibrates within a wider range of frequency than 
a quartz crystal does, and consequently, the frequency control Is not as precise 
as it is with quartz. While quartz crystal resonators can operate within 10 ppm 
of the intended frequency, ceramic resonators generally operate within 
5000 ppm. However, if accuracy greater than 5000 ppm is not necessary, ce¬ 
ramic resonators are a cost-effective alternative. Table 9-1 shows a compari¬ 
son of three types of resonators. 

Table 9-1. Comparison of Resonator Types 


Type 

Relative Price 

Adjustment 

Frequency 

Tolerance 

Long-Term 

Stability 

LC 

Very low 

Necessary 

± 20000 ppm 

Fair 

Ceramic 

Low 

Not necessary 

± 5000 ppm 

Excellent 

Crystal 

High 

Not necessary 

± 10 ppm 

Excellent 


This document assumes that a quartz crystal is being used as the resonator; 
however, the information applies equally to ceramic resonators, unless other¬ 
wise specified. 

Figure 9-2 shows a circuit model that Is equivalent to a crystal. The graphs il¬ 
lustrate the behavior of the magnitude of the crystal Impedance and the reac¬ 
tance of the crystal with frequency. The three components. Lx, Rx, and Cx, 
model the electrical behavior related to the mechanical vibration of the crystal. 
Lx and Cx control the resonant frequency according to the same equation 
shown in Figure 9-1. Rx models the mechanical energy loss In the crystal and 
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is related to the power dissipation in the crystal. Co is the capacitance of the 
two electrodes. The dielectric of the quartz physically separates the two elec¬ 
trodes. Together these components are a reasonably accurate electrical mod¬ 
el for the behavior of the crystal. Values for these component models are usu¬ 
ally available from the crystal manufacturer. 


Figure 9-2. Crystal Equivalent Circuit Model 


O-tt 



<>-o 


Co 


Notes: 1) Cq is the capacitance of the two electrodes. 

2) Lx, Rx, and Cx model the electrical behavior related to the mechanical vibration of 
the crystal; Lx and Cx control the resonant frequency according to the same equation 
shown in Figure 9-1 and Rx models the mechanical energy loss in the crystal. 

Like the series LC resonator, crystals have an impedance minimum at a fre¬ 
quency determined by Lx and Cx. This is the series-resonant frequency (fs). 
The presence of Co also introduces an impedance maximum at a frequency 
determined by Lx and Cq. This frequency is the parallel-resonant frequen¬ 
cy (fp). A graph of impedance magnitude that illustrates this behavior is also 
shown in Figure 9-3. The series-resonant frequency corresponds to the natu¬ 
ral mechanical vibration frequency of the crystal. The parallel-resonant fre¬ 
quency is basically an electrical measurement phenomenon that results from 
the resonance between Lx and Co i n the electrical model of the crystal and does 
not occur naturally. Consequently, all crystal oscillators operate at or near their 
series-resonant frequency. 


Figure 9-3. Impedance Characteristics of Crystal 


Impedance 



Notes: 


1) fs = series-resonant frequency 

2) fp = parallel-resonant frequency 
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The graph in Figure 9-3 illustrates the behavior of the magnitude of the imped¬ 
ance of the crystal, but the crystal’s phase response is also important in oscilla¬ 
tor design. Figure 9-4 shows the reactance of the crystal with frequency. The 
reactance (and consequently the phase) Is 0 at the series-resonant frequency 
(fs), because at this frequency the reactances of Lx and Cx cancel each other. 
At this frequency, the total impedance of the crystal Is equal to the resistance 

Rx. 

Figure 9-4. Reactance Characteristics of Crystal 



Notes: 1 ) fs = series-resonant frequency 
2) fp = parallel-resonant frequency 

Below fs, the crystal appears capacitive (negative reactance). Between fs and 
fp, the crystal appears inductive (positive reactance) and above fp the crystal 
appears capacitive again. In an oscillator circuit, the crystal is always operated 
at or slightly above the series-resonant frequency in the inductive region. The 
capacitance Co has little effect on the series-resonant point (fs), but in combina¬ 
tion with the external load on the crystal, the capacitance Co affects the paral¬ 
lel-resonant point (fp). For simplification of the circuit analysis, Co is sometimes 
considered part of the external load on the crystal. 

When ordering a crystal, you must tell the manufacturer whether a 
series-resonant or parallel-resonant crystal is required. The nature of these 
terms is slightly different from the serial- and parallel-resonant frequency 
terms (fg and fp) previously described. A series-resonant crystal Is Intended to 
operate in a circuit with a low-load Impedance across its terminals and, 
consequently, resonates very close to the series-resonant frequency (fg). A 
parallel-resonant crystal is intended to operate in a circuit with a 
high-impedance load across its terminals and operates at some frequency 
slightly above fg where the crystal’s reactance is inductive. In this case, the 
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crystal attempts to resonate at the frequency at which its own inductive 
reactance exactly cancels the capacitive reactance of the combination of Co 
and an external-capacitive load. If supplied with the desired frequency and the 
external load to which the crystal will be connected, the manufacturer can 
produce a crystal that meets both of these requirements. The oscillator circuit 
used on the ’C3x devices requires a parallel-resonant crystal. 

9.2.2 Crystal Response to Square-Wave Drive 

Figure 9-5(a) shows the equivalent circuit model of a crystal driven by a step- 
function voltage source In series with a resistive load. In this figure, the capaci¬ 
tance, or Co, of the crystal model is ignored because it is usually considered 
part of the load on the crystal and does not strongly affect the series-resonant 
frequency. When a step function excites a crystal, the crystal produces 
damped sinusoidal oscillation at its series-resonant frequency, as shown in 
Figure 9-5(b). The magnitude of the damping on the output waveform is pro¬ 
portional to the magnitude of Rx. 

The lowest natural frequency of the crystal is the fundamental frequency. De¬ 
pending on the design of the crystal. It can also have contributions to its output 
waveform from odd multiples of the fundamental frequency, or overtones. 
However, if the response at the fundamental frequency is considerably stron¬ 
ger than the response at these overtone frequencies, the contribution of the 
overtones to the output waveform Is negligible. 

If the step-function input is changed to a square-wave drive (a periodic set of 
step functions) at the frequency of the fundamental, the output of the crystal 
is sinusoidal, as shown in Figure 9-5(c). The source of the square wave pro¬ 
vides enough energy to overcome the damping in each cycle. Although a 
square wave has a high content of odd overtones, the crystal resonates at its 
fundamental frequency and strongly attenuates all other frequencies. Conse¬ 
quently, the output of a crystal driven by a square wave is sinusoidal. If this 
sinusoidal output is fed back to the input of an appropriately designed amplifier, 
as shown in Figure 9-5(d), sustained oscillation is generated. 
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Figure 9-5. Crystal Response to a Square-Wave Drive 


(a) Circuit 


Crystal model 


Lx Rx Cx 



(b) Step function 


+ 



(c) Square wave drive 



(d) Amplifier 



Notes: 1) Cq is the capacitance of the two electrodes. 

2) Lx, Rx, and Cx model the electrical behavior related to the mechanical vibration of 
the crystal; Lx and Cx control the resonant frequency according to the same equation 
shown in Figure 9-1 and Rx models the mechanical energy loss in the crystal. 
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9.3 Pierce Oscillator Circuit 

Figure 9-6 shows an oscillator circuit in its simplest form: an amplifier and a 
feedback network. This circuit must meet two requirements to sustain oscilla¬ 
tion: 

□ The circuit must have positive feedback. 

□ The open loop gain must be greater than 1. 

In Figure 9-6, A is the gain of the amplifier and B is the gain of the feedback 
network. For the circuit to have open-loop gain greater than 1, A x B must be 
greater than 1. For the circuit to have positive feedback, the phase shift around 
the loop must be 0 degrees (or n360°, where n = 0,1,2, 3,...). If these condi¬ 
tions are met, the output oscillates at a frequency determined by the frequency 
selective feedback network and the amplitude increases until it reaches the 
linearity limitation of the amplifier. 

Figure 9-6. Simple Form of an Oscillator Circuit 


Amplifier 



Feedback network 
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There are many possible combinations of amplifiers, crystals, and phase- 
shifting components (inductors and capacitors) that meet the above-specified 
conditions for oscillation. One of the most common is a circuit based on the 
Pierce oscillator. Figure 9-7 shows an ideal version of this circuit. The Pierce 
oscillator uses an inverting amplifier, a parallel-resonant crystal as a resonator, 
and two capacitors as phase-shifting elements and load for the crystal. This 
circuit is used for several reasons: 

□ It has a large frequency range, from approximately 1 kHz to 200 MHz. 

□ It has high Q (because the load impedances are mostly capacitive and not 
resistive) and consequently exhibits very good stability. 

□ It maintains a high output signal while driving the crystal at a low-power 
level. This is important at higher frequencies, where crystals are physical¬ 
ly thinner and therefore have lower power-dissipation limits. 

□ The low-pass RC networks formed by the crystal and load capacitors tend 
to filter transient noise spikes, giving the circuit good noise immunity. 

Figure 9-7. Pierce Circuit: Ideal Operation 



180° 90° 


> 90° above series resonance 
90° at series resonance 
< 90° above series resonance 


9.3.1 Oscillator Operation 

The ideal circuit operates in the following manner. An input signal to the amplifi¬ 
er appears at the output, phase-shifted by approximately 180°. If it is assumed 
that at a certain frequency the impedance of Ci is much greater than Ri, then 
the phase shift of this RC network introduces another approximately 90° phase 
shift. At the series-resonant frequency, the crystal appears to be a resistor and 
forms another RC network with C 2 . If the impedance of C 2 is much greater than 
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the series resistance (Rx) of the crystal, this network provides another 
90° phase shift. The total phase shift around the loop is now 
180° + 90° + 90° = 360°. This phase shift meets one of the conditions for os¬ 
cillation. If the gain of the amplifier is high enough to overcome the losses in 
the Ri - Cl - crystal (Rx) - C 2 network for a total loop gain of greater than 1, then 
the circuit meets both oscillation conditions and oscillates. 

This explanation, however, is unrealistic because it ignores too many aspects 
of real-world circuit effects. Figure 9-8 illustrates a more typical example of the 
circuit behavior. In this case, the inverting amplifier has some phase delay, 
which causes it to produce a phase shift somewhat longer than 180°, depend¬ 
ing on the frequency of operation. If oscillation is to occur, the passive compo¬ 
nents are forced to compensate for this phase difference. The only way the im¬ 
pedance of the load capacitances can change is when the frequency of opera¬ 
tion changes. The frequency of operation tends to move above the series-res¬ 
onant frequency, lowering the impedance of the load capacitances and raising 
the impedance of the crystal as it goes from being purely resistive to being both 
resistive and inductive (see Figure 9-2 (c) on page 9-5). When the frequency 
changes such that the loop phase shift once again equals 360°, the circuit os¬ 
cillates at the higher frequency. For this reason, most Pierce circuits operate 
5-40 ppm above the series-resonant frequency. This explanation clearly il¬ 
lustrates the circuit’s actual behavior and explains why a parallel-resonant 
crystal always operates slightly above the series-resonant frequency. 

Figure 9-8. Pierce Circuit: Actuai Operation 



185° 73° 102° 
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When a square-wave output is desired (such as for a microprocessor clock 
source) the Pierce circuit sometimes is implemented in the manner shown in 
Figure 9-9. The crystal and load capacitances are in the same configuration 
as the circuit shown in Figure 9-8, with the exception that Ri is replaced with 
the output impedence of the inverter. In the linear region, the inverter behaves 
like a linear inverting amplifier. The resistor (Rf) is introduced across the invert¬ 
er to bias it into the linear region. This is the transition region between the two 
digital states, as shown in Figure 9-11 on page 9-14. Otherwise, the inverter 
output moves toward one of its two stable digital states and oscillation does 
not start because there is no gain in these regions (the output characteristic 
shown in Figure 9-11 on page 9-14 is flat). 

Figure 9-9. Pierce Circuit for Square-Wave Output 



The removal of Ri from the circuit improves the loop gain and thus improves 
the likelihood of oscillation. However, removing R^ also increases the drive lev¬ 
el (power dissipation) on the crystal. The power dissipation limit of the crystal 
must not be exceeded under these conditions (power dissipation issues are 
discussed in section 9.4.4 on page 9-18.) Otherwise, the circuit operation is 
identical to that described for Figure 9-8. 

The second inverter is added as a buffer and a waveshaping device. Since the 
output of the crystal is sinusoidal, the output of the first inverter also is sinusoi¬ 
dal. The second inverter provides a rail-to-rail square-wave output at the 
oscillation frequency to drive the microprocessor clock. 


9-12 























Pierce Oscillator Circuit 


9.3.2 Pierce Oscillator Configuration for the TMS320C30 and TMS320C31 

The ’C3x DSPs have two options for clocking the processor: 

□ Divide-by-2 operation of an externally supplied clock 

□ Divide-by-2 operation using the internal oscillator 

To use the ’C3x internal oscillator, connect the crystal across the X2/CLKIN 
and X1 pins of the ’C30 and ’C31 (the ’C32 does not support the internal oscil¬ 
lator option.) 

The ’C3x oscillator circuitry (with the exception of the crystal and the load ca¬ 
pacitors) is integrated into the processor. Figure 9-10 shows the ’C3x oscilla¬ 
tor circuitry, which is similar to the Pierce integrated circuit oscillator shown in 
Figure 9-9. On the ’C3x, the waveshaping inverter (I 2 ) takes its input from the 
input side of the inverter being used as the amplifier (I 1 ) rather than from the 
output as in the Pierce oscillator. This has little effect on the oscillator other 
than generating the digital complement of the clock that is generated in the cir¬ 
cuit of Figure 9-9. Also, the feedback resistor in Figure 9-9 is integrated into 
the ’C3x as an active-load transistor-feedback network, so an external-feed- 
back resistor is unnecessary. This feedback network ensures that the inverter 
I-I is biased in its linear region. 

Figure 9-10. TMS320C3x Oscillator Circuitry 



The inverters in the oscillator circuitry differ from the usual CMOS inverter con¬ 
figuration (shown in Figure 9-11) in that the p-channel transistor is biased as 
an active load instead of having the gate connected as the input of the inverter. 
This difference is part of the biasing scheme, which helps to ensure that the 
oscillator starts when power is applied. This design causes the rise and fall 
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times to be asymmetrical (for example, the rise time is longer than fall time), 
but since the oscillator output is divided by 2 before driving the internal-proces¬ 
sor circuitry, the duty cycle of the final clock (H1 or H3) is 50%. 


Figure 9-11. Digital Inverter Circuit and Its Transfer Characteristic 



9.3.3 Overtone Operation of the Oscillator 

Although crystals are usually considered to vibrate at only one frequency, they 
also resonate at odd multiples, or overtones, of the series-resonant frequency. 
The series-resonant frequency is the fundamental frequency of the crystal, 
and the odd overtones are odd multiples of the fundamental frequency (for ex¬ 
ample: 3x, 5x, 7x,...). For low frequencies, it is common to operate crystals at 
their fundamental frequency. For higher frequencies, the crystal is made thin¬ 
ner. The thinner the crystal is, the more fragile and expensive it becomes. Thin¬ 
ner crystals also have a low-power dissipation limit and damage easily when 
overdriven. 

Most fundamental mode crystals operate at frequencies of 40 MHz or less. To 
generate frequencies higher than 40 MHz, it is common to use overtone crys¬ 
tals. Overtone crystals are optimized for operation at an overtone frequency 
with the fundamental frequency attenuated. Figure 9-12 illustrates the imped¬ 
ance of a crystal with respect to frequency. The strongest change in imped¬ 
ance is at the fundamental frequency, but there is also a response at the third 
and fifth overtones. If a crystal with the properties in Figure 9-12 is used in a 
Pierce circuit, it oscillates at the fundamental frequency. However, if the funda¬ 
mental frequency is attenuated, the crystal circuit oscillates at the next higher 
odd overtone, in this case, the third overtone. High-frequency operation is 
achieved by using an overtone crystal and attenuating the fundamental fre¬ 
quency. 
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Figure 9-12. Impedance Characteristics of a Crystal 
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For the Pierce circuit used on the ’C3x, this attenuation of the fundamental fre¬ 
quency is achieved by capacitively coupling an inductor (Li) in parallel with the 
load capacitor (Ci), as shown in Figure 9-13. The value of Li is chosen to reso¬ 
nate with Ci at some intermediate frequency between the frequency of the de¬ 
sired overtone and the next lower odd overtone. At the desired overtone fre¬ 
quency, the impedance of Li is high enough compared to Ci that Li is neglected 
and the network of Ci and the inverter’s output impedance provides the 
near-90° phase lag desired. Since the phase conditions are met, the circuit 
oscillates at this frequency. At all lower overtones, Li is a lower impedance 
than Cl and causes a 90° phase lead instead of phase lag. At any of these low¬ 
er frequencies, the total phase shift around the feedback loop is 180°, not 360°, 
which is negative feedback, and stabilizes the circuit and prevents oscillation. 
Li is coupled with a 0.1 pF capacitor, which prevents the inductor from altering 
the dc bias of the inverter while causing negligible additional impedance at the 
oscillation frequency. 

Figure 9-13. Oscillator Circuit for Overtone Crystal Operation 



As an example, assume a 60-MHz third-overtone crystal is used with 10 pF 
load capacitors. The fundamental for this crystal is at 60/3 = 20 MHz. Li must 
be chosen to resonate with Ci at a frequency between 20 and 60 MHz. If you 
choose the frequency halfway in between, 40 MHz, the value of Li is calculated 
as follows: 

Li = 1/(co2Ci) = 1/(47t2f2Ci) = 1/(4jt2 (40 x 106)2 (10 x 10-12)) = 1.58 pH 

Since the value of this inductance is not critical, the closest conveniently avail¬ 
able inductor is used as long as the resonant frequency of Li - Ci falls between 
the desired overtone and the next lower overtone. 

A variety of crystals have been evaluated in this circuit. Although at higher fre¬ 
quencies, fifth-overtone crystals are more commonly available, they are not 
recommended for this circuit. The available gain from the internal inverting am¬ 
plifier limits this configuration to third-overtone crystals. Several third-overtone 
crystal solutions for this circuit up to 60 MHz are listed in Table 9-2 on page 
9-22. 
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9.4 Design Considerations 


This section discusses some of the aspects of the design of the oscillator and 
their effects on its operation. 


9.4.1 Crystal Series Resistance (Rx) 


The series resistance of the crystal has a strong effect on the design of the os¬ 
cillator, primarily in loop gain. Rx limits the crystal’s minimum impedance value 
(seen at series resonance). Since the impedances of Lx and Cx cancel each 
other at this frequency, the impedance of the crystal is due entirely to Rx. The 
voltage divider formed by the crystal and C 2 influences the loop gain. As the 
impedance of the crystal becomes larger, the loss of gain due to the voltage 
divider becomes greater. Low-loop gain causes the oscillator to take longer to 
start up and prevents oscillation if the overall loop gain falls below 1. Higher 
crystal series resistance also reduces the overall oscillator circuit Q, resulting 
in poorer frequency stability. For these reasons, it is desirable to use the lowest 
Rx possible. Crystals with series resistance of 40 ohms or less are recom¬ 
mended. 


9.4.2 Load Capacitors 


In the Pierce circuit used on the ’C3x, the load capacitors have a strong effect 
on how far above the series-resonant frequency the crystal oscillates. The 
crystal’s shunt-terminal capacitance, Co, is considered part of the crystal’s 
external-load capacitance as far as the frequency controlling elements (Cx and 
Lx) are concerned. A parallel-resonance oscillator circuit operates at the 
frequency where the reactances of the crystal (Cx and Lx) cancel the 
reactances from the load (Co, Ci, C2). Consequently, changes in the 
external-load capacitance cause the oscillator to change frequency to 
compensate for the phase change. The following formula gives an 
approximate value for the frequency shift from the series-resonant frequency: 



where r = ^ and Cl = C, - 1 - C 2 


x 


The derivative of this formula, as shown below, is useful for determining the 
frequency variance due to changes in the load capacitance. This derivative is 
applied to find the frequency range implied by a load capacitance with a given 
tolerance. Also, if there is a need to adjust the operating frequency, use this 
formula to determine the appropriate value of a variable load capacitor. 
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Crystal manufacturers often accommodate requests for specific values for 
load capacitance to be used with their crystals. Values of 20 pF and 30 pF are 
commonly available. These load capacitance values are represented by Ci + 
C 2 , so for a crystal designed for load capacitance of 20 pF, Ci = C 2 = 10 pF is 
used. Capacitance values higher than 30 pF increase attenuation, lowering 
the overall loop gain. Capacitance values this high can cause the circuit to stop 
oscillating. A load capacitance of 20-30 pF is recommended for high-frequen¬ 
cy crystals. Ceramic resonators usually require higher load capacitance than 
high-frequency crystals (see the manufacturer’s recommendations). Load ca¬ 
pacitance values are included in Table 9-2 on 9-22. 


9.4.3 Loop Gain 


Loop gain primarily affects the startup time of the oscillator. Overall loop gain 
must be greater than 1 for oscillation to be sustained. Higher loop gain causes 
the oscillation amplitude to increase rapidly, therefore reducing the time nec¬ 
essary for the oscillator to reach its steady state. 

The minimum gain measured for the ’C3x inverter is 5.6. To maintain an overall 
loop gain of 1, the external component network of C1-crystal-C2 must not 
introduce a loss of greater than 5.6. For this reason, the values of the load ca¬ 
pacitance and crystal-series resistance have a strong effect on whether the cir¬ 
cuit oscillates. 

9.4.4 Drive Level/Power Dissipation 

Another parameter specified when ordering a crystal Is the drive level or power 
dissipation. Higher frequency crystals generally have lower power dissipation 
ratings because the crystal is physically thinner and is damaged by excessive 
voltages. Power dissipation also affects frequency stability because the crys¬ 
tal’s frequency of operation is dependent on temperature. Excessive power 
dissipation causes crystal heating and results In frequency drift. 

There is not a convenient way to measure the power dissipation in the crystal. 
The series resistance (Rx) is the only power-dissipating component in the crys¬ 
tal. Measuring the external voltage on the crystal includes the voltage across 
Lx and Cx. Therefore, the power dissipation in Rx cannot be easily calculated 
directly from the voltage on the crystal. It is necessary to measure the current 
through the crystal using a current probe or to indirectly measure the current 
by measuring the voltage across a small resistor in series with the crystal. You 
can then calculate the power by using |2 r. 
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Once the drive level is known, if it is necessary to limit the drive level to the crys¬ 
tal, one of the simplest ways to do so is shown in Figure 9-14. A resistor (Rd) 
is added in series between X-| and the external components. This resistor 
drops part of the voltage driven by the ’C3x and consequently lowers the drive 
voltage on the crystal. The disadvantage to this method is that the voltage drop 
reduces the overall loop gain of the oscillator circuit. The value of Rq must be 
large enough to bring the power dissipation of the crystal within the manufac¬ 
turer’s specification, but Rq must not be so large that the loop gain drops below 
1 or the circuit no longer oscillates. Using crystals with minimum power dis¬ 
sipation ratings of 1 mW is recommended. 

The oscillator circuit solutions in Table 9-2, when operated without Rq, have 
yielded crystal-power dissipation measurements near 1 mW. Differences in 
circuit and crystal parameters can cause the power dissipation in the crystal 
to slightly exceed 1 mW. If crystal-power dissipation is critical, adding a resistor 
(Rq) with a value of 33 Q to limit the crystal-power dissipation or obtaining crys¬ 
tals with power dissipation ratings higher than 1 mW, is recommended. When 
operated with Rq = 33 Q, each of the circuit solutions shown in Table 9-2 have 
exhibited less than 1 mW crystal power dissipation. 

Figure 9-14. Addition of to Limit Drive Level of the Crystal 
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9.4.5 Startup Time 


Figure 9-15 shows that when the oscillator starts, low-amplitude oscillations 
gradually build until the linearity limit of the amplifier is reached. You experi¬ 
ence this startup time at power-up. Maximizing loop gain minimizes the startup 
time for the oscillator. 

Startup time depends on the external components used, but generally 
requires at least 100 ms after power up for the oscillator to stabilize. For this 
reason, a reset delay of 150-200 ms is recommended following power up. 


Figure 9-15. Oscillator Startup 


/ 


Power 

applied 


Vdd 

0 V 


Vdd 



9.4.6 Frequency-Temperature Characteristics of Crystais 

The actual operating frequency of a crystal depends on temperature. The ex¬ 
tent to which frequency changes with respect to temperature strongly relates 
to the cut of the crystal. AT- and SC-cut crystals behave differently from DT-, 
CT-, and BT-cut crystals. Even slight changes in the cut angle of the crystal can 
strongly affect the frequency-temperature characteristics. 

Most crystals available in the frequency range of interest for DSPs are AT-cut 
crystals. The frequency-temperature characteristic for AT-cut crystals is a 
third-order function, similar to that shown in Figure 9-16. This graph shows the 
general temperature-frequency behavior of AT-cut crystals. Similar informa¬ 
tion is readily available from crystal manufacturers. 
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Figure 9-16. Example Frequency-Temperature Characteristic of AT-Cut Crystals 



Temperature °C 

9.4.7 Crystal Aging 


Crystal aging is the gradual change in the frequency of a crystal overtime. This 
change occurs due to stress relief between the mounting structure and the 
electrodes and absorption (or deabsorption) of contaminants from the resona¬ 
tor surfaces. Changes in temperature accelerate both of these mechanisms. 
The major mechanism for aging in crystals above 1 MHz is mass transfer to 
and from the resonator surfaces. The most rapid aging occurs early in the crys¬ 
tal’s lifetime, and then aging tends to stabilize. For example, a crystal that ages 
10-60 parts per million (ppm) in a year experiences 5 ppm of that aging in the 
first month. Crystals are available (at additional expense) that have very low 
aging rates, due to cleaner fabrication and packaging processes. These crys¬ 
tals have aging characteristics as low as 1 x 10‘8 ppm per year. Complete in¬ 
formation on aging characteristics is available from crystal manufacturers. 
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Oscillator Solutions for Common Frequencies 


9.5 Oscillator Solutions for Common Frequencies 

The oscillator solutions in this section were built and tested with samples from 
the manufacturers listed in Table 9-2. These circuits were tested at room tem¬ 
perature and verified to operate correctly within the recommended range of Vdd 
(4.75-5.25 V). 

Table 9-2. Oscillator Solutions by Frequency 


Frequency 

Mode 

Type 

Supplier 

Part Number 

CM 

o 

d 

Rd 

Li 

40 MHz 

Fundamental 

Crystal 

SaRonix 

HFX series crystals 

10 pF 

0/33t 

- 

40 MHz 

Third overtone 

Crystal 

Anderson 

011-668-04663 

10 pF 

0/33t 

3.3 iiH 

50 MHz 

Fundamental 

Crystal 

SaRonix 

HFX series crystals 

10 pF 

0/33t 

- 

50 MHz 

Third overtone 

Crystal 

SaRonix 

SRX5223 

10 pF 

0/33t 

3.3 jiH 

60 MHz 

Third overtone 

Crystal 

Anderson 

011-668-04725 

10 pF 

0/33t 

3.3 jiH 


t When these circuits are operated without Rd, they yieid crystal power dissipation measurements near 1 mW. Differences in circuit 
and crystal parameters can cause the power dissipation in the crystal to slightly exceed 1 mW. If crystal power dissipation is criti¬ 
cal, it is recommended that 33 Cl of Rd be added to limit the crystal power dissipation or obtain crystals with power dissipation 
ratings higher than 1 mW. When operated with Rd = 33 Cl, each of the circuits shown exhibited less than 1 mW crystal power dis¬ 
sipation. 

The following circuits are used for ceramic resonators and fundamental-mode 
crystal resonators. The circuit in Figure 9-17 is used for all circuits marked fun¬ 
damental mode in Table 9-2. The circuit in Figure 9-18 is used for all circuits 
marked third-overtone mode in Table 9-2. Crystals used in these circuits must 
be parallel resonant with a series resistance of 40 ohms or less and must have 
a power dissipation rating of 1 mW or greater. 

Figure 9-17. Fundamental-Mode Circuit 
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Figure 9-18. Third-Overtone Circuit 
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Chapter 10 


XDS510 Emulator Design Considerations 


This chapter explains the design requirements of the XDS51 emulator and 

discusses the Extended Development System (XDS) cable (manufacturing 
part number 2617698-0001). This cable is identified by a label on the cable 
pod marked JTAG3/5V and supports both standard 3-V and 5-V target system 
power inputs. 

The term JTAG emulation, as used in this book, refers to Tl scan-based emula¬ 


tion, which is based on the IEEE 1149.1 standard. 
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Designing the MPSD Emuiator Connector (12-Pin Header) 


10.1 Designing the MPSD Emuiator Connector (12-Pin Header) 

The ’C3x uses modular port scan device (MPSD) technology to allow complete 
emulation through a serial scan path of the ’C3x. To communicate with the 
emulator, your target system must have a 12-pin header {2 rows of 6 pins) with 
the connections that are shown in Figure 10-1 .To use the target cable, supply 
the signals shown in Table 10-1 to a 12-pin header with pin 8 cut out to provide 
keying. For the latest information, see the JTAG/MPSD Emulation Technical 
Reference. 

Although you can use other headers, the recommended header is the un¬ 
shrouded, straight header having the following DuPont connector systems 
part numbers: 

□ 65610-112 

□ 65611-112 

□ 37996-112 

□ 67997-112 

Figure 10-1. 12-Pin Header Signals and Header Dimensions 


EMUlt 

1 

2 

GND 


EMUOt 

3 

4 

GND 

Header dimensions: 

EMU2t 

5 

6 

GND 

Pin-to-pin spacing: 0.100 in. (X,Y) 

Pin width: 0.025-in. square post 

PD(Vcc) 

EMUS 

7 

■ 8 

No pin (key)^ 

GND 

Pin length: 0.235-in. nominal 

9 

10 


H3 

11 

12 

GND 



t These signals must be pulled up with separate 20-kQ. resistors to Vqq. 
t While the corresponding female position on the cable connector is plugged to prevent improper 
connection, the cable lead for pin 8 is present in the cable and is grounded as shown in the 
schematics and wiring diagrams in this document. 

Table 10-1. 12-Pin Header Signal Descriptions and Pin Numbers 


XDS510 Signal 

Description 

’C30 Pin Number 

’C31 Pin Number 

EMUO 

Emulation pin 0 

F14 

124 

EMU1 

Emulation pin 1 

E15 

125 

EMU2 

Emulation pin 2 

F13 

126 

EMUS 

Emulation pin 3 

E14 

123 

H3 

’C3x H3 

A1 

82 

PD 

Presence detect. Indicates that the emulation cable is connected 
and that the target is powered up. PD must be tied to Vqq in the 
target system. 
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Emulator Cable Pod Logic 


10.2 Emulator Cable Pod Logic 

Figure 10-2 shows a portion of logic in the emulator cable pod. The 33-Q resis¬ 
tors have been added to the EMUO, EMU1, and EMU2 lines to minimize cable 
reflections. 


Figure 10-2. Emulator Cable Pod Interface 


EMU3 (pin 9) 


H3 (pin 11) 
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EMU1 (pin 1) 
EMUO (pin 2) 
EMU2 (pin 3) 
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10.3 MPSD Emulator Cable Signal Timing 

Figure 10-3 shows the signal timings for the emulator cable pod. Table 10-2 
defines the timing parameters. The timing parameters are calculated from val¬ 
ues specified in the standard data sheets for the emulator and cable pod and 
are for reference only. Texas Instruments does not test or guarantee these tim¬ 
ings. 

Figure 10-3. Emulator Cable Pod Timings 


w-1-w 

I I 



Table 10-2. Emulator Cable Pod Timing Parameters 


No. Reference Description Min Max Unit 


1 

tH3 min 
^HS max 

H3 period 

35 

200 ns 

2 

tH3 high min 

H3 high pulse duration 

15 

ns 

3 

^HS low min 

H3 low pulse duration 

15 

ns 

4 

td (EMUO, 1,2) 

EMUO, 1,2 valid from H3 low 

7 

23 ns 

5 

tsu (EMU3) 

EMU3 setup time to H3 high 

3 

ns 

6 

thd (EMU3) 

EMU3 hold time from H3 high 

11 

ns 
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10.4 Connections Between the Emulator and the Target System 

It is extremely important to provide high-quality signals between the emulator 
and the ’C3x on the target system. In many cases, the signal must be buffered 
to produce high quality. The need for signal buffering can be divided into three 
categories, depending on the placement of the emulation header: 

□ No signals buffered. In this situation, the distance between the emulation 
header and the ’C3x should be no more than 2 inches (see Figure 10-4). 


Figure 10-4. Connections Between the Emulator and the TMS320C3x With No Signals 
Buffered 
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Connections Between the Emulator and the Target System 


□ Transmission signais buffered. In this situation, the distance between 
the emulation header and the ’C3x is greater than 2 inches but less than 
6 inches. The transmission signals, H3 and EMU3, are buffered through 
the same package (see Figure 10-5). 


Figure 10-5. Connections Between the Emuiator and the TMS320C3x With Transmission 
Signais Buffered 
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Connections Between the Emulator and the Target System 


□ All signals buffered. The distance between the emulation header and the 
’C3x is greater than 6 inches but less than 12 inches. All ’C3x emulation 
signals, EMUO, EMU1, EMU2, EMU3, and H3, are buffered through the 
same package (see Figure 10-6). 

Figure 10-6. Connections Between the Emulator and the TMS320C3x With All Signals 
Buffered 
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CAUTION 


H3 buffer restrictions 

Do not connect any devices between 
the buffered H3 output and the header! 
Otherwise, you will degrade the quality 
of the signal. 
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Mechanical Dimensions for the 12-Pin Emulator Connector 


10.5 Mechanical Dimensions for the 12-Pin Emuiator Connector 

The ’C3x emulator target cable consists of a 3 foot section of jacketed cable, 
an active cable pod, and a short section of jacketed cable that connects to the 
target system. The overall cable length is approximately 3 feet, 10 Inches. 
Figure 10-7 and Figure 10-8 show the mechanical dimensions for the target 
cable pod and short cable. Note that the pin-to-pin spacing on the connector 
is 0.10 inches in both the X and Y planes. The cable pod box Is nonconductive 
plastic with four recessed metal screws. 


Figure 10-7. Pod/Connector Dimensions 
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Figure 10-i 


t. 12-Pin Connector Dimensions 
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Note: All dimensions are in inches and are nominal unless otherwise specified. 
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Diagnostic Appiications 


10.6 Diagnostic Applications 

For system diagnostic applications or to embed emulation compatibility on 
your target system, connect a ’C3x device directly to a Tl ACT8990 test bus 
controller (TBC) as shown in Figure 10-9. The TBC is described in the Texas 
Instruments Advanced Logic and Bus Interface Logic Data Book. A TBC can 
connect to only one ’C3x device. 

Figure 10-9. TBC Emulation Connections for TMS320C3x Scan Paths 

Vcc 



Notes: 


1) In a ’C3x design, the TBC can connect to only one ’C3x device. 

2) The ’C3x device’s H1 clock drives TCKI on the TBC. This is different from the 
emulation header connections where H3 is used. 
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Chapter 11 


Development Support and 
Part Ordering Information 


This chapter provides development support information, device part numbers, 
and support tool ordering information for the ’C3x. 

Each ’C3x support product is described in the TMS320 Family Development 
Support Reference Guide. In addition, more than 100 third-party developers 
offer products that support the Tl TMS320 family. For more information, refer 
to the TMS320 Third-Party Reference Guide. 

For information on pricing and availability, contact the nearest Tl field sales 


office or authorized distributor. 
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11.1 Development Support 

This section describes the development support provided by Texas Instru¬ 
ments. 

11.1.1 Development Tools 

Texas Instruments offers an extensive line of development tools for the ’C3x 
generation of DSPs, including tools to evaluate the performance of the proces¬ 
sors, generate code, develop algorithm implementations, and fully integrate 
and debug software and hardware modules. These tools are described below. 

Code Generation Tools 

There are two types of code generation tools: 

□ Optimizing ANSi C compiier. Translates ANSI C language directly into 
highly optimized assembly code. You can then assemble and link this code 
with the Tl assembler/linker, which is shipped with the compiler. It supports 
both ’C3x and ’C4x assembly code. This product is currently available for 
the PC (DOS, DOS extended memory, and OS/2), VAX/VMS, and SPARC 
workstations. See the TMS320 Fioating-Point DSP Optimizing C Compiier 
User’s Guide for detailed information. 

□ Assembier/iinker. Converts source mnemonics to executable object code. 
It supports both ’C3x and ’C4x assembly code. This product is currently 
available for the PC (DOS, DOS extended memory, and OS/2). The 
’C3x/’C4x assembler for the VAX/VMS and SPARC workstations is only 
available as part of the optimizing ’C3x/’C4x compiler. See the TMS320 
Fioating-Point DSP Assembiy Language Toois User’s Guide for detailed 
information. 
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System Integration and Debug Tools 

There are four types of system integration and debug toois: 

□ Simulator. Simulates through software the operation of the ’C3x and can 
be used in C and assembly software development. This product is current¬ 
ly available for the PC (DOS and Windows) and SPARC workstations. See 
the TMS320C3X C Source Debugger User’s Guide for detailed informa¬ 
tion. 

□ XDS510 emulator. Performs full-speed in-circuit emulation with the’C3x, 
providing access to all registers as well as to internal and external memory. 
It can be used in C and assembly software development and has the capa¬ 
bility of debugging multiple processors. This product is currently available 
for the PC (DOS, Windows, and OS/2) and SPARC workstations. This 
product includes the emulator board (emulator box, power supply, and 
small computer system interface (SCSI) connector cables in the SPARC 
version), the ’C3x C source debugger software, and the JTAG cable. 

Because ’C3x and ’C5x XDS510™ emulators also come with the same 
emulator board (or box), you can buy the ’C3x C source debugger soft¬ 
ware as a separate product called the ’C3x C Source Debugger Conver¬ 
sion Software. This enables you to debug ’C3x/’C4x/’C5x applications with 
the same emulator board. The emulator cable that comes with the ’C5x 
XDS510 emulator is not compatible with the ’C3x. You need a JTAG 
emulation conversion cable. See the TMS320C3x C Source Debugger 
User’s Guide for detailed information on the ’C3x emulator. 

□ Evaluation module (EVM). Each EVM comes complete with a PC halfcard 
and software package. The EVM board contains the following: 

■ A ’C30 and a 33-MFLOPS, 32-bit floating-point DSP 

■ A16K-word, zero-state SRAM, allowing coding of most algorithms di¬ 
rectly on the board 

■ A speaker/microphone-ready analog interface for multimedia, 
speech, and audio applications development 

■ A multiprocessor serial port interface for connecting to multiple EVMs 

■ A host port for PC communications 

The system also comes with all the software required to begin applications 
development on a PC host. Equipped with a C and assembly language 
source-level debugger for the DSP, the EVM has a window-oriented, 
mouse-driven interface that enables the downloading, executing, and de¬ 
bugging of assembly code or C code. 
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The ’C3x assembler/linker is also included with the EVM. For users who 
prefer programming in a high-level language, an optimizing ANSI C com¬ 
piler and an Ada compiler are offered separately. 

□ Emulation porting kit (EPK). Enables you to integrate emulation technolo¬ 
gy directly into your system without the need of an XDS510 board. The 
EPK is intended to be used by third parties and high-volume board 
manufacturers and requires a licensing agreement with Texas Instru¬ 
ments. The kit contains host (or PC) source and object code, which lets 
you tailor ’C30 EVM-like capabilities to your ’C3x system through the 
SM74ACT8990 test bus controller (TBC). The EPK can be used in such 
applications as program download for system self test and initialization or 
system emulation and debug to feature resident emulation support. EPK 
software includes the Tl high-level language (HLL) debugger in object as 
well as source code for the TBC communication interface. The HLL code 
is the windowed debugger found with manyTI DSP simulators, EVMs, and 
emulators. With the EPK, the HLL user interface can be ported directly to 
the system board. The source code for the TBC communication interface 
consists of such commands as read/write, memory run, stop, and reset 
that communicate with the ’C3x device. Using the EPK reduces system 
and development cost and speeds time to market. For more information 
on the kit, call the DSP hotline at (281)274-2320. 

11.1.2 TMS320 Third Parties 

The TMS320 family is supported by product and service offerings from more 
than 100 independent vendors and consultants, known as third parties. These 
support products take various forms (both software and hardware) from cross- 
assemblers, simulators, and DSP utility packages to logic analyzers and emu¬ 
lators. Additionally, Tl third parties offer more than 150 algorithms that are 
available for license through the TMS320 software cooperative. These algo¬ 
rithms can greatly reduce development time and decrease time to market. The 
expertise of those involved in support services ranges from speech encoding 
and vector quantization to software/hardware design and system analysis. 

For a more detailed description of services and products offered by third par¬ 
ties, See the TMS320 Third Party Support Reference Guide and the TMS320 
Software Cooperative Data Sheet Packet. Call the Literature Response Cen¬ 
ter at (800) 477-8924 to request a copy. 
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11.1.3 Technical Training Organization (TTO) TMS320 Workshop 

The ’C3x DSP design workshop is tailored for hardware and software design 
engineers and decision-makers who design and use the ’C3x generation of 
DSP devices. Hands-on exercises throughout the course give participants a 
rapid start in using ’C3x design skills. Microprocessor/assembly language ex¬ 
perience is required. Experience with digital design techniques and C lan¬ 
guage programming experience is desirable. The following topics are covered 
in the ’C3x workshop: 

□ ’C3x architecture/instruction set 

□ Use of the PC-based ’C3x software simulator and EVM 

□ Floating-point and parallel operations 

□ Use of the ’C3x assembler/linker 

□ C programming environment 

□ System architecture considerations 

□ Memory and I/O interfacing 

□ ’C3x development support 

For registration, pricing, or enrollment information on this and other TTO 
TMS320 workshops, call (800) 336-5236, ext. 3904. 

11.1.4 TMS320 Literature 

Extensive DSP documentation is available, including data sheets, user’s 
guides, and application reports. In addition, DSP textbooks that aid research 
and education have been published by Prentice-Hall, John Wiley and Sons, 
and Computer Science Press. To order literature or to subscribe to the DSP 
newsletter Details on Signal Processing (for up-to-date information on new 
products and services), call the Literature Response Center at (800)477-8924 
or log on to the DSP Solutions web site at http://www.ti.com/dsps. 


11.1.5 DSP Hotline 


For answers to TMS320 technical questions on device problems, develop¬ 
ment tools, documentation, upgrades, and new products, you can contact the 
DSP hotline by: 

□ Phone at (281) 274-2320 Monday through Friday from 8:30 a.m. to 
5:00 p.m. Central Time 

□ Fax at (281) 274-2324 

□ Electronic mail at dsph@ti.com 

□ European fax at 33-1-3070-1032 

□ Semiconductor Product Information Center (PIC) at (214) 644-5580 
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To ask about third-party applications and algorithm development packages, 
contact the third party directly. See the TMS320 Third-Party Support Refer¬ 
ence Guide tor addresses and phone numbers. 

The DSP hotline does not provide pricing information. Contact the nearest Tl 
field sales office or the Tl PIC for prices and availability of TMS320 devices and 
support tools. 

11.1.6 Bulletin Board Service (BBS) 

The TMS320 DSP Bulletin Board Service (BBS) is a telephone-line computer 
service that provides information on TMS320 devices, specification updates 
for current or new devices and development tools. The BBS also gives infor¬ 
mation about silicon and development tool revisions and enhancements, new 
DSP application software as it becomes available, and source code for pro¬ 
grams from any TMS320 user’s guide. 

You can access the BBS by: 

□ Modem: (300-, 1200-, or 2400-bps) dial (713)274-2323. Set your modem 
to 8 data bits,1 stop bit, no parity. 

□ Internet: Use anonymous ftp to stp.ti.com (Internet port address 
192.94.94.1). The BBS content is located in the subdirectory called mir¬ 
rors. 

To find out more about the BBS, see the TMS320 FamiiyDeveiopment Support 
Reference Guide. 
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11.2 TMS320C3X Part Ordering Information 

This section provides device and support tool part numbers. Table 11-1 lists 
the part numbers for the ’C30 and ’C31; Table 11-2 gives ordering information 
for ’C3x hardware and software support tools. An explanation of the TMS320 
family device and development support tool prefix and suffix designators fol¬ 
lows the two tables to assist in understanding the TMS320 product numbering 
system. 


Table 11-1. TMS320C3x Digital Signal Processor Part Numbers 


Device 

Technology 

Operating 

Frequency 

Package Type 

Typical Power 
Dissipation 

TMS320C30GEL 

0.8-|LLm CMOS 

33 MHz 

Ceramic 181-pin PGA 

1.00 w 

TMS320C30GEL40 

0.8-|LLm CMOS 

40 MHz 

Ceramic 181-pin PGA 

1.25 W 

TMS320C31PQL/PQA 

0.8-|Lim CMOS 

33 MHz 

Plastic 132-pin OFP 

0.75 W 

TMS320C31PQL40 

0.8-|LLm CMOS 

40 MHz 

Plastic 132-pin OFP 

0.90 W 

TMS320LC31PQL 

0.8-|Lim CMOS 

33 MHz 

Plastic 132-pin OFP 

0.50 W 

TMS320C31PQL50 

0.8-|Lim CMOS 

50 MHz 

Plastic 132-pin OFP 

1.00 W 

SMJ320C316FA27 

0.8-|Lim CMOS 

28 MHz 

Ceramic 141-pin PGA 

0.60 W 

SMJ320C31HF627 



Ceramic 132-pin OFP 

0.60 W 

SMJ320C316FA33 



Ceramic 141-pin PGA 

0.75 W 

SMJ320C316HF633 



Ceramic 132-pin PGA 

0.75 W 

SMJ320C306BM33 

0.8-|Lim CMOS 

33 MHz 

Ceramic 181-pin PGA 

1.10 W 

SMJ320C30HF633 



Ceramic 196-pin OFP 


SMJ320C30GBM28 

0.8-|LLm CMOS 

28 MHz 

Ceramic 181-pin PGA 

1.00 W 

SMJ320C30HF628 



Ceramic 196-pin OFP 

1.00 W 

SMJ320C30HTM28 





SMJ320C30GBM25 

0.8-|Lim CMOS 

25 MHz 

Ceramic 181-pin PGA 

1.00 W 

SMJ320C30HF625 



Ceramic 196-pin OFP 

1.00 W 


SMJ320C30HTM25 
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Table 11-2. TMS320C3x Support Tool Part Numbers 

(a) Software 

Tool Description 

Operating System 

Part Number 

C Compiler & Macro Assembler/ Linker 

VAX/VMS 

TMDS3243255-08 


PC-DOS/MS-DOS 

TMDS3243855-02 


SPARC (Sun OS)t 

TMDS3243555-08 

Assembler/Linker 

PC-DOS/MS-DOS; OS/2 

TMDS3243850-02 

Simulator 

VAX VMS 

TMDS3243251-08 


PC-DOS/MS-DOS 

TMDS3243851-02 


SPARC (SUN OS) t 

TMDS3243551-09 

Digital Filter Design Package 

PC-DOS 

DFDP 

TMS320C3X Emulation Porting Kit 

PC; SPARC 

TMDX3240030 


(b) Hardware 

Tooi Description 

Operating System 

Part Number 

XDS510 Emulator 

PC/MS-DOS 

TMDS3240130 

Evaluation Module (EVM) 

PC/MS-DOS 

TMDS3260030 


t Note that SUN UNIX supports ’C3x software tools on the 68 000 family-based SUN-3 series workstations and on the SUN-4 
series machines that use the SPARC processor, but not on the SUN-386i series of workstations. 
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11.2.1 Device and Development Support Tool Prefix Designators 

Prefixes to Tl part numbers designate phases in the product’s deveiopment 
stage for both devices and support toois, as shown in the foiiowing definitions: 

Device Development Evolutionary Flow 

□ TMX: Experimental device that is not necessarily representative of the 
final device’s electrical specifications 

□ TMP: Final silicon device that conforms to the device’s electrical specifica¬ 
tions but has not completed quality and reliability verification 

□ TMS: Fully qualified production device 

Support Tool Development Evolutionary Flow 

□ TMDX: Development support product that has not yet completed Tl’s 
internal qualification testing for development systems 

□ TMDS: Fully qualified development support product 

TMX and TMP devices and TMDX development support tools are shipped with 
the following disclaimer: 

“Developmental product is intended for internal evaluation purposes.” 

I-1 

Note: Prototype Devices 

Tl recommends that prototype devices (TMX or TMP) not be used in produc¬ 
tion systems. Their expected end-use failure rate is undefined but predicted 
to be greater than standard qualified production devices. 

I_I 

TMS devices and TMDS development support tools have been fully character¬ 
ized, and their quality and reliability have been fully demonstrated. Tl’s stan¬ 
dard warranty applies to TMS devices and TMDS development support tools. 

TMDX development support products are intended for internal evaluation pur¬ 
poses only. They are covered by Tl’s warranty and update policy for micropro¬ 
cessor development systems products; however, they should be used by cus¬ 
tomers only with the understanding that they are developmental in nature. 
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11.2.2 Device Suffixes 

The suffix indicates the package type (for example, N, FN, or GE) and temper¬ 
ature range (for example, L). 

Figure 11-1 presents a legend for reading the complete device name for any 
TMS320 family member. 


Figure 11-1. TMS320 Device Nomenclature 


TMS 320 C 30 GE 


L 


Prefix 


Temperature Range 


TMX= Experimental device 
TMP= Prototype device 
TMS= Qualified device 
SMJ= MIL-STD-883C 


H = 0 to 50°C 
L = 0 to 70°C 
S = -55 to 100°C 
M = -55 to 125°C 
At= -40 to 85°C 


Device Family- 

320 = TMS320 family 

Technology- 

C = CMOS 
E = CMOS EPROM 
P = OTPEPROM 
No letter = NMOS 

Device- 


1st-generation DSP: 
10 

14 

15 

16 
17 

2nd-generation DSP: 
20 

25 

26 

3rd-generation DSP: 

30 

31 

32 

4th-generation DSP: 
40 

5th-generation DSP: 

50 

51 


Package Type 

FD = Leadless ceramic chip 
carrier 

FJ = Ceramic leaded chip carrier 
FN = Plastic leaded chip carrier 
FZ = Ceramic leaded chip carrier 
GB = Ceramic pin grid array 
GE = Ceramic pin grid array, 
glass seal 

HT = Ceramic quad flatpack 
(gull wing) 

HU = Ceramic quad flatpack 
JD = Ceramic dual in line 
package side brazed 
N = Plastic dual in line package 
PQ = Plastic quad flatpack 


t See electrical specifications for ’C31 PQA case temperature ratings 
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Chapter 12 


TMS320C30 Power Dissipation 


This chapter presents the information necessary to determine the require¬ 
ments for the power supply current for the ’C30 under different operating 
conditions. 

As device sophistication and levels of integration increase with evolving semi¬ 
conductor technologies, actual levels of power dissipation vary widely. These 
levels depend heavily on the particular application in which the device is used 
and the nature of the program being executed. In addition, due to the charac¬ 
teristics of CMOS technology, power requirements vary according to clock 
rates and data values being processed. Using this information, you can deter¬ 
mine the device’s power dissipation and, in turn, calculate thermal manage¬ 


ment requirements. 
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Power Dissipation Characteristics 


12.1 Power Dissipation Characteristics 

Generally, power supply current requirements are related to the system, for ex¬ 
ample, operating frequency, supply voltage, temperature, and output load. As 
devices become more complex, the specification must also be based on what 
the device does. CMOS devices inherently draw current only during switching 
through the linear region. Therefore, the power supply current is related to the 
rate of switching. Furthermore, since the output drivers of the ’C30 are specified 
to drive direct current (dc) loads, the power supply current resulting from exter¬ 
nal writes depends not only on switching rate but also on the value of data writ¬ 
ten. 

12.1.1 Power Supply Factors 

The power-supply current consists of four basic factors: 

□ Quiescent current 

□ Internal operations 

□ Internal bus operations 

□ External bus operations 

12.1.2 Power Supply Consumption Dependencies 

The power-supply current consumption depends on many factors. Four are 
system-related: 

□ Operating frequency 

□ Supply voltage 

□ Operating temperature 

□ Output load 

Several other factors are related to ’C30 operation. They include: 

□ Duty cycle of operations 

□ Number of buses used 

□ Wait states 

□ Cache usage 

□ Data value of internal and external bus 
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Power Dissipation Characteristics 


The total power supply current for the device is described in the following equa¬ 
tion, which applies the four basic power supply current factors and the depen¬ 
dencies described above: 

I = (Iq + ^iops + ^ ibus + ^xbus) X FV x T 


where: 

Iq = quiescent current 

Ijops = current from internal operations 

Ijbus = current from internal bus usage, including data value and cycle time 
dependencies 

Ixbus = current from external bus usage, including data value, wait state, 
cycle time, and capacitive load dependencies 

FV = scale factor for frequency and supply voltage 

T = scale factor for operating temperature 

The application of this equation and the determination of all of the dependen¬ 
cies are described in detail In this chapter. 

If a less detailed analysis is sufficient, use the minimum, typical, and maximum 
values to determine a rough estimate of the power supply current require¬ 
ments: 

□ The minimum power supply current requirement is 110 mA. 

□ The typical and average current consumption is 200 mA, as described in 
the TMS320C30 Digital Signal Processor data sheet. These are 
associated with most algorithms running on the device unless data output 
is excessive. 

□ If an extremely conservative approach is desired, use the maximum value. 


Maximum Current Requirement 

The maximum current requirement is 600 mA and occurs oniy 
under worst case conditions. These include writing aiternating 
data (AAAAAAAAh to 55555555h) out of both externai buses 
simuitaneousiy, every cycie, with 80 pF ioads, and running at 
33 MHz. 
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Power Dissipation Characteristics 


12.1.3 Determining Aigorithm Partitioning 

Each part of an algorithm has its own pattern with respect to internal and exter¬ 
nal bus usage. To analyze the power supply current requirement, you must 
partition an algorithm into segments with distinct concentrations of internal or 
external bus usage. Analyze each program segment to determine its power 
supply current requirement. You can then calculate the average power supply 
current from the requirements of each segment of the algorithm. 

12.1.4 Test Setup Description 

All ’C30 supply current measurements were performed on the test setup 
shown in Figure 12-1. The test setup consists of a ’C30, 8K words of zero- 
wait-state Cypress Semiconductor SRAMs (CY7C186-25PC), and resistor/ 
capacitor (RC) loads on all data and address lines. A Tektronix ™ current probe 
(P6042) measures the power supply current in all Vqq lines of the device. The 
supply voltage on the output load is 2.15 V. Unless otherwise specified, all 
measurements are made at a: 

□ Supply voltage of 5.0 V 

□ Input clock frequency of 33 MHz 

□ Capacitive load of 80 pF 

□ Operating temperature of 25°C 

Figure 12-1. Current Measurement Test Setup for the TMS320C30 

Vdd 
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Current Requirements for Internal Circuitry 


12.2 Current Requirements for Internal Circuitry 

The power supply current requirement for internal circuitry consists of the fol¬ 
lowing factors: quiescent current, internal operations, and internal bus opera¬ 
tions. Quiescent current and internal operations are constants, but the internal 
bus operations vary with the rate of internal bus usage and the data values be¬ 
ing transferred. 


12.2.1 Quiescent Current 

Quiescent current refers to the baseline supply current drawn by the ’C30 dur¬ 
ing minimal internal activity. It includes the current required to fetch an instruc¬ 
tion from on- or off-chip memory. Examples of quiescent current include: 

□ Maintaining timers and serial ports 

□ Executing the IDLE instruction 

□ ’C30 in HQLD mode pending external bus access 

□ ’C30 in reset 

□ Branching to self 

The quiescent requirement for the ’C30 equals 110 mA. 


12.2.2 Internal Operations 

Internal operations include register-to-register multiplication, ALU operations, 
and branches. It does not include external bus usage or significant internal bus 
usage. Internal operations add a constant 55 mA above the quiescent current. 
Therefore, the total contribution of quiescent current (110 mA) and internal 
operations (55 mA) is 165 mA. During an RPTS instruction (repeat single 
instruction), activity other than the instruction being repeated is suspended; 
therefore, internal power supply current is related only to the operation per¬ 
formed by the instruction being executed. 


12.2.3 Internal Bus Operations 

Internal bus operations include all operations that use the internal buses 
extensively, such as internal RAM access every cycle. No distinction is made 
between internal reads (such as instruction or operand fetches from internal 
RQM or internal RAM banks) and internal writes (such as operand stores to 
internal RAM banks); internally they are equal. Since power consumption 
depends on the data value in the internal bus, significant use of internal buses 
adds a data-dependent factor to the power supply current. 


TMS320C30 Power Dissipation 
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Current Requirements for Internal Circuitry 


Pipeline conflicts, use of cache, fetches from external wait-state memory, and 
writes to external wait-state memory all affect the internal and external bus 
cycles of an algorithm executing on the ’C30. Therefore, you must determine 
the algorithm’s internal usage in order to accurately calculate the power supply 
current requirements. The ’C30 software simulator and XDS™ emulator both 
provide benchmarking and timing capabilities that help you determine bus 
usage. 

The current resulting from internal bus usage varies exponentially with transfer 
rates. Figure 12-2 shows the internal bus current requirements for transfer¬ 
ring alternating data (AAAAAAAAh to 55555555h). A transfer rate less than 1 
implies multiple accesses per single HI cycle (that is, using direct memory ac¬ 
cess (DMA), etc.). Transfer cycle times greater than 1 refer to single-cycle 
transfers with one or more cycles between them. The minimum transfer cycle 
time is one third, which corresponds to three accesses in a single HI cycle. 

Figure 12-2. Internal Bus Current Versus Transfer Rate (AAAAAAAAh to 55555555h) 



0 2 4 6 8 10 12 14 

Transfer cycle time (H1 cycles) 

The data set AAAAAAAAh to 55555555h exhibits the maximum current for 
these types of operations. Less current is required for transferring other data 
patterns, and current values can be derated accordingly. 

As the transfer rate decreases (transfer cycle time increases), the incremental 
Iqd approaches 0 mA. Transfer rates corresponding to more than seven HI 
cycles do not add any current and are considered insignificant. This figure rep¬ 
resents the incremental Iqd ffom internal bus operations and is added to quies¬ 
cent and internal operations current values. 
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Current Requirements for Internal Circuitry 


For example, the maximum transfer rate corresponds to three accesses every 
cycle or one-third H1 transfer cycle time. At this rate, 85 mA is added to the 
quiescent (110 mA) and internal operation (55 mA) current values for a total 
of 250 mA. 

Figure 12-3 shows the data dependence of the internal bus current require¬ 
ment when the data is other than As followed by 5s. The shaded trapezoidal 
region represents the internal bus current consumed for all possible data val¬ 
ues transferred. The lower line represents the scale factor for transferring the 
same data (all Os or all Fs). The upper line represents the scale factor for trans¬ 
ferring alternating data (all Os to all Fs or all As to all 5s). 

Figure 12-3. Internal Bus Current Versus Data Complexity Derating Curve 




0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 


Relative data complexity 


The number of possible permutations of data values is quite large. The extent 
to which data varies is referred to as relative data complexity. This term refers 
to a relative measure of the extent to which data values are changing and the 
extent to which the number of bits are changing state. Relative data complexity 
ranges from 0, signifying minimal variation of data, to a normalized value of 1, 
signifying greatest data variation. 
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Current Requirements for Internal Circuitry 


If a statistical knowledge of the data exists, Figure 12-3 can be used to deter¬ 
mine the exact power supply requirement according to internal bus usage. For 
example, Figure 12-3 indicates a 63% scale factor when all Fs are moved in¬ 
ternally every cycle with two accesses per cycle. This scale factor is multiplied 
by 55 mA(from Figure 12-2, at one-half HI cycle transfer time), yielding 34.65 
mA because of internal bus usage. Therefore, an algorithm running under 
these conditions requires about 200 mA of power supply current 
(110 + 55 + 34.65). 

Since a statistical knowledge of the data may not be readily available, a nomi¬ 
nal scale factor may be used. The median between the minimum and maxi¬ 
mum values at 50% relative data complexity yields a value of 0.80 and can be 
used as an estimate of a nominal scale factor. You can use this nominal data 
scale factor of 80% for internal bus data dependency, adding 44 mA to 110 mA 
(quiescent current) and 55 mA (internal operations) to yield 210 mA. As an up¬ 
per bound, assume worst case conditions of three accesses of alternating data 
every cycle, adding 85 mA (from Figure 12-2) to 110 mA (quiescent current) 
and 55 mA (internal operations) to yield 250 mA. 
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12.3 Current Requirement for Output Driver Circuitry 

The output driver circuits on the ’C30 are required to drive significantly higher 
do and capacitive loads than internal device logic. Therefore, they are de¬ 
signed to drive larger currents than internal devices. Because of this, output 
drivers impose higher supply current requirements than other sections of cir¬ 
cuitry on the device. 

Accordingly, the highest values of supply current are required when external 
writes are performed at high speed. During reads, or when the external buses 
are not in use, the ’C30 does not drive the data bus; this eliminates the most 
significant factor of output buffer current. Furthermore, in typical cases, only 
a few address lines change, or the whole address bus is static. Under these 
conditions, an insignificant amount of supply current is consumed. When no 
external writes are performed or when writes are performed infrequently, cur¬ 
rent from output buffer circuitry can be ignored. 

When external writes are performed, the current required to supply the output 
buffers depends on several factors: 

□ Data pattern transferred 

□ Rate at which transfers are made 

□ Number of wait states implemented (because wait states affect rates at 
which bus signals switch) 

□ External bus dc and capacitive loading 

External operations involve writes external to the device and constitute the 
major power supply current factor. The power supply current for the external 
buses is made up of three factors and is summarized in the following equation: 

Ibase + Iprim + lexp = power supply current for the external buses 

where: 

Ibase = 60-mA baseline current 
Iprim = primary bus current 
lexp = expansion bus current 

The remainder of this section describes in detail the calculation of external bus 
current factors. 
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12.3.1 Primary Bus Current 

The current from primary bus writes varies with both wait states and write cycle 
time. Current factors from output driver circuitry are represented as offsets 
from the previously computed value (quiescent + internal operations + internal 
bus). Since the baseline value is related to internal current factors, negative 
values for current offset are obtained under some circumstances. However, 
negative current does not occur. 

To obtain accurate current values, you must first establish the timing of write 
cycles of the buses. To determine the rate and timings at which write cycles 
to the external buses occur, you must analyze program activity, including any 
pipeline conflicts that may exist. Information from this manual and the ’C30 
emulator or simulator is useful in making these determinations. You must 
account for the effects of cache use in these analyses because the cache can 
affect whether instructions are fetched from external memory. 

When evaluating external write activity in a given program segment, you must 
consider whether a particular level of external write activity is significant. If 
writes are performed at very slow rates on both the primary and the expansion 
buses, the current from external writes can be ignored. If writes are performed 
at high speed on only one of the two external buses, you should calculate cur¬ 
rent requirements. 

Although you can obtain negative incremental current values under some 
circumstances, the total contribution for external buses, including baseline 
current, is always positive. When external buses are not used much, the total 
current requirements approach the current contribution from the internal fac¬ 
tors, which is solely a function of internal activity. This places a lower limit on 
current contributions from the primary and expansion buses, because the total 
current from external buses is the sum of the 60-mA baseline value and the 
primary and expansion bus factors. This effect is discussed in further detail in 
the rest of this section. 

Once you establish bus-write cycle timing, use Figure 12-4 to determine the 
contribution to supply current from this bus activity. Figure 12-4 shows current 
contributions from the primary bus for various numbers of wait states and HI 
cycles between writes. This current contribution is exhibited when writes of al¬ 
ternating 55555555h and AAAAAAAAh are performed at a capacitive load of 
80 pF per output signal line. This condition exhibits the highest current values 
on the device. The curve in the figure represents incremental or additional cur¬ 
rent contributed by the primary bus output driver circuitry while writing alternat¬ 
ing 55555555h and AAAAAAAAh. Current values obtained from this graph are 
scaled and added to several other current values to calculate the total current 
for the device. As indicated in the figure, the lower curve represents the current 
contribution for 18 or more cycles between writes. 
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Figure 12-4. Primary Bus Current Versus Transfer Rate and Wait States 



0 1 2 3 4 5 6 7 


Wait states 

The number of cycles between writes refers to the number of HI cycles be¬ 
tween the active portion of the write cycles (as defined in the TMS320C30 Digi¬ 
tal SignalProcessordata sheet), that is, when STRB, MSTRB, orlOSTRBand 
R/W (or XR/W, as the case may be) are low between H1 cycles. As shown in 
Figure 1 2-4, the minimum number of cycles between writes is 1, because with 
back-to-back writes there is one H1 cycle between active portions of the writes. 

To further illustrate the relationship between current and write cycle time. 
Figure 12-5 shows the characteristics of current for various numbers of cycles 
between writes for zero wait states. You can use the information on this curve 
to obtain more precise values of current If zero wait states are used and the 
number of cycles between writes does not fall on one of the curves in 
Figure 12-4. 
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Current Requirement for Output Driver Circuitry 


Figure 12-5. Primary Bus Current Versus Transfer Rate at Zero Wait States 



0 2 4 6 8 10 12 14 16 18 20 


H1 cycles between writes 


Although these graphs contain negative current values, negative current has 
not necessarily actually occurred. The negative values exist because the 
graphs represent a current offset from the previously computed current value. 
Using this approach to depict current contributions from different factors 
breaks down the current calculations to allow you to make calculations inde¬ 
pendently. 

Figure 12-4 and Figure 12-5 show that the current consumption during exter¬ 
nal bus writes is negative if writes are performed at intervals of more than 18 
cycles. Under these conditions, use the incremental value of -30-mA current 
contribution from the primary bus. You should use a value of -30 mA only if the 
expansion bus is used extensively because the total contribution for external 
buses, including baseline current, must always be positive. If the expansion 
bus is not used and the primary bus is not used much, the current contribution 
from the primary bus is always greater than or equal to 20 mA. This ensures 
that the correct total current value is obtained when summing external bus fac¬ 
tors. Once a current value has been obtained from Figure 12-4 or 
Figure 12-5, this value can, if necessary, be scaled by a data dependency fac¬ 
tor, as described in section 12.3.3 on page 12-14. This scaled value is then 
summed along with several other current values to determine the total supply 
current. 
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12.3.2 Expansion Bus Current 

Currents from the primary and expansion buses differ slightly for several rea¬ 
sons, including the fact that the expansion bus has 11 fewer address outputs 
than the primary bus (13 rather than 24). This overall current contribution is 
slightly lower from the expansion bus than from the primary bus. 

Determining the expansion bus current uses the same premise as determining 
the primary bus current. Figure 12-6 and Figure 12-7 show the same current 
relationships for the expansion bus as Figure 12-4 and Figure 12-5 show for 
the primary bus. The total external buses’ current contributions must be posi¬ 
tive; if the primary bus is not used and the expansion bus is not used much, 
the minimum current contribution from the expansion bus is -30 mA. The cur¬ 
rent values obtained from these figures must be scaled by a data dependency 
factor, as described in section 12.3.3 on page 12-14. 

Figure 12-6. Expansion Bus Current Versus Transfer Rate and Wait States 
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Wait states 
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Figure 12-7. Expansion Bus Current Versus Transfer Rate at Zero Wait States 
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HI cycles between writes 


12.3.3 Data Dependency Factors 

Data dependency of current for the primary and expansion buses is expressed 
as a scale factor that is a percentage of the maximum current of either of the 
two buses. Data dependencies are shown in Figure 12-8 for the primary bus 
and in Figure 12-9 for the expansion bus. 

These two figures show normalized weighting factors that you can use to scale 
current requirements on the basis of patterns in data being written on the exter¬ 
nal buses. The range of possible weighting factors forms a trapezoidal pattern 
bounded by extremes of data values. As can be seen from Figure 12-8 and 
Figure 12-9, the minimum current is exhibited by writing all Os, while the maxi¬ 
mum current occurs when writing alternating 55555555h and AAAAAAAAh. 
This condition results in a weighting factor of 1, which corresponds to using the 
values from Figure 12-4 and/or Figure 12-5 directly. 

As with internal bus operations, data dependencies for the external buses are 
well defined, but accurate prediction of data patterns is often impractical. Un¬ 
less you have precise knowledge of data patterns, you should use an estimate 
of a median or average value for scale factor. If you assume that data is neither 
5s and As, nor all Os, and varies randomly, a value of 0.85 is appropriate. 
Otherwise, if you prefer a conservative approach, you can use a value of 1.0 
as an upper bound. 
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Figure 12-8. Primary Bus Current Versus Data Complexity Derating Curve 



Figure 12-9. Expansion Bus Current Versus Data Complexity Derating Curve 
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Regardless of the approach you take for scaling, once you determine the scale 
factors for primary and expansion buses, apply these scale factors to the cur¬ 
rent values found by using the graphs in the previous two sections. For exam¬ 
ple, if a nominal scale factor of 0.85 is used and the system uses zero wait 
states with two cycles between accesses on both the primary and expansion 
buses, the current contribution from the two buses is as follows: 

Primary: 0.85 x 80 mA = 68 mA 

Expansion: 0.85 x 40 mA = 34 mA 

12.3.4 Capacitive Load Dependence 

Once you account for cycle timing and data dependencies, calculate and apply 
the capacitive loading effects. Figure 12-10 shows the scale factor to apply to 
the current values obtained above as a function of actual load capacitance If 
the load capacitance presented to the buses Is less than 80 pF. 

In the previous example, if the load capacitance is 20 pF instead of 80 pF, a 
scale factor of 0.84 is used, yielding: 

Primary: 0.84 x 68 mA = 57.12 mA 

Expansion: 0.84 x 34 mA = 28.56 mA 

The slope of the load capacitance line in Figure 12-10 is 26% normalized Iqd 
per pF. While this slope may be used to interpolate scale factors for loads 
greater than 80 pF, the ’C30 Is specified to drive output loads of less than 
80 pF. Interface timings cannot be ensured at higher loads. 

Figure 12-10. Current Versus Output Load Capacitance 



Output load capacitance (pF) 
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12.4 Calculation of Total Supply Current 

The previous sections discuss currents contributed by several sources on the 
’C30. Because actual current values are unique and independent for each 
source, each current source is discussed separately. In an actual application, 
however, the sum of the independent contributions from each current deter¬ 
mines the total current requirement for the device. This current value is the 
total current supplied to the device through all of the Vqd inputs and returned 
through the Vss connections. 

Note that numerous Vqd and Vss the device are routed to a variety of 

internal connections, not all of which are common. Externally, however, all of 
these pins must be connected in parallel to a 5-volt source and use ground 
planes with as little impedance as possible. 


12.4.1 Combining Suppiy Current from Aii Factors 

To determine the total supply current requirements for any given program 
activity, calculate each of the appropriate factors and combine them in the fol¬ 
lowing sequence: 

1) Start with 110-mA quiescent current. 

2) Add 55 mA for internal operations unless the device is dormant. Dormant 
periods occur during the execution of IDLE, NOPs, branches to self, or 
performance of internal and/or external bus operations using an RPTS 
instruction (see section 12.2.2 on page 12-5). Internal or external bus 
operations executed through RPTS do not contribute an internal opera¬ 
tions power supply current factor. However, current factors in the next two 
steps may still be required, even though the 55 mA is omitted. 

3) If significant internal bus operations are performed, add the calculated cur¬ 
rent value. (See section 12.2.3 on page 12-5.) 

4) If external writes are performed at high speed, add 60 mA and then add 
the values for primary and expansion bus current factors. (See sec¬ 
tion 12.3 on page 12-9.) If only one external bus is used, the appropriate 
incremental current for the unused bus must still be included because the 
current offsets include factors required for operating both buses. The total 
current contribution for external buses, including baseline, is always posi¬ 
tive. 

The current value obtained from summing these factors is the total device 
current requirement for a given program activity. 
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12.4.2 Supply Voltage, Operating Frequency, and Temperature Dependencies 

Current dependencies specific to each supply current factor (such as internal 
or external bus operations) are discussed in section 12.1.2 on page 12-2. 
Supply voltage level, operating temperature, and operating frequency affect 
the requirements for the total supply current and must be maintained within the 
required device specifications. 

Once you determine the total current for a particular program segment, the 
dependencies that affect the total current requirements are applied as a scale 
factor in the same manner as data dependencies discussed in other sections. 
Figure 12-11 shows the relative scale factors for the supply current values as 
a function of both Vdd operating frequency. 

Power supply current consumption does not vary significantly with operating 
temperature. However, a scale factor of 2% normalized Iqd pei" 50°C change 
in operating temperature may be used to derate current within the specified 
range noted in the TMS320C30 Digital Signal Processordata sheet. This tem¬ 
perature dependence is shown graphically in Figure 12-12. A temperature 
scale factor of 1.0 corresponds to current values at 25°C, which is the tempera¬ 
ture for all references in the document. 

Figure 12-11. Current Versus Frequency and Supply Voltage 



0 5 10 15 20 25 30 


Vdd = 5.5 v 
Vdd = 5.25 V 
Vdd = 5.0 V 

Vdd = 4.75 v 
Vdd = 4.5 v 
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Figure 12-12. Current Versus Operating Temperature Change 



-80 -60 -40 -20 0 20 40 60 80 

Change in operating temperature (°C) 


12.4.3 Total Current Equation Example 

The procedure for determining the power supply current requirement is sum¬ 
marized in the following equation: 

I = (Iq + liops ^ibus ^xbus) x FV x T 
where: 

Iq = 110 mA 

^iops ~ 

libus = Di X f-| (see Table 12-1 on page 12-20) 

^xbus ~ ^base ^prim ^exp 

with 

^base ~ 

Iprim = D 2 X C 2 X F 2 (see Table 12-1) 
lexp = D 3 X C 3 X F 3 (see Table 12-1) 

FV = scale factor for frequency and supply voltage 
T = scale factor for operating temperature 

Table 1 2-1 describes the variables used in the power supply current equation. 
The table displays figure numbers from which the value can be obtained. 
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Calculation of Total Supply Current 


Table 12-1. Current Equation Variables 


Variable 

Description 

Graph/Value 

'q 

Quiescent current 

110 mA 

^iops 

Internal operations current 

55 mA 


^ibus 

Internal bus operations current 

t 


Di 

Internal bus data scale factor 

Figure 

12-3 

fl 

Internal bus current requirement 

Figure 

12-2 

^xbus 

External bus operations current 

t 


^base 

External bus base current 

60 mA 


Iprim 

Primary bus operations current 

t 


□2 

Primary bus data scale factor 

Figure 

12-8 

C2 

Primary bus capacitance load scale factor 

Figure 

12-10 

^2 

Primary bus current requirement 

Figure 

Figure 

12-4 or 

12-5 

^exp 

Expansion bus operations current 

t 


Ds 

Expansion bus data scale factor 

Figure 

12-9 

^3 

Expansion bus capacitance load scale factor 

Figure 

12-10 

^3 

Expansion bus current requirement 

Figure 

Figure 

12-6 or 

12-7 

FV 

Frequency/supply voltage scale factor 

Figure 

12-11 

T 

Temperature scale factor 

Figure 

12-12 


t See power supply current equation on page 12-19. 

12.4.4 Peak Versus Average Current 

If current is observed over the course of an entire program, some segments usu¬ 
ally exhibit significantly different levels of current required for different durations 
of time. For example, a program may spend 80% of its time performing internal 
operations, drawing a current of 250 mA; it may spend the remaining 20% of its 
time performing writes at full speed to the expansion bus, drawing 300 mA. 

While knowledge of peak current levels is important in order to establish power 
supply requirements, some applications require information about average 
current. This is particularly significant if periods of high peak current are short 
in duration. Average current can be obtained by performing a weighted sum 
of the currents from the various independent program segments over time. In 
the example above, the average current can be calculated as follows: 

I = 0.8 X 250 mA -I- 0.2 x 300 mA = 260 mA 

Using this approach, you can calculate average current for any number of pro¬ 
gram segments. 
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12.4.5 Thermal Management Considerations 

Heating characteristics of the ’C30 depend on power dissipation, which in turn 
depends on power suppiy current. When you make thermai management cai- 
cuiations, you must consider how power suppiy current contributes to power 
dissipation and to the time constant of the ’C30 package thermai characteris¬ 
tics. 

Depending on sources and destinations of current on the device, some current 
contributions to i^Q do not constitute a factor of power dissipation at 5 V. 
Accordingiy, if you use the totai current f iowing into Vqq to caicuiate power dis¬ 
sipation at 5 V, you obtain erroneousiy iarge vaiues for power dissipation. 
Power dissipation is defined as: 

P = I X V 

where: 

P = power 

I = current 

V = voltage 

If device outputs are driving any dc load to a logic high level, only a minor con¬ 
tribution is made to power dissipation, because CMOS outputs typically drive 
to a level within a few tenths of a volt of the power supply rails. If this is the case, 
subtract these current factors out of the total supply current value; then calcu¬ 
late their contribution to power dissipation separately and add it to the total 
power dissipation (see Figure 12-13). If this is not done, these currents result¬ 
ing from driving a logic high level into a dc load cause unrealistically high power 
dissipation values. The error occurs because the currents resulting from driv¬ 
ing a logic high level Into a dc load appears as a portion of the current used 
to calculate power dissipation from at 5 volts. 
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Figure 12-13. Load Currents 


Vdd 


Idd^ 

TMS320C30 

r 


^ 'ss 

Vdd 

'ddi 

r 

TMS320C30 



Iqh 

-> 

Device output driven high 


lOL 


Device output driven low 


Furthermore, external loads draw supply-only current when outputs are driven 
high because, when outputs are in the logic 0 state, the device is sinking cur¬ 
rent that is supplied from an external source. The power dissipation from this 
current factor does not have a contribution through Iqd but contributes to pow¬ 
er dissipation with a magnitude of: 

P = Vql X Iql 

where: 

Vql = low-level output voltage 

Iql = current being sunk by the output (as shown in Figure 12-13) 

The power dissipation factor from outputs that are driven low must be calcu¬ 
lated and added to the total power dissipation. 

When outputs with dc loads are switched, the power dissipation factors from 
outputs being driven high and outputs being driven low are averaged and add¬ 
ed to the total device power dissipation. You should calculate power factors 
from dc loading of the outputs separately for each program segment before 
you calculate average power. 

Any unused Inputs that are left disconnected may float to a voltage level that 
causes input buffer circuits to remain in the linear region and therefore contrib¬ 
ute a significant factor to power supply current. Accordingly, you should deacti¬ 
vate any unused inputs by grounding them or pulling them high if you desire 
absolute minimum power dissipation. If you must pull several unused inputs 
high, pull them high together using one resistor to minimize component count 
and board space. 
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When you use power dissipation values to determine thermal requirements, 
you should use the average power unless the time duration of individual pro¬ 
gram segments is long. The thermal characteristics of the ’C30 in the 181 -pin 
grid array (PGA) package are exponential in nature, with a time constant of 
t = 4.5 minutes. When subjected to a change in power, the temperature of the 
device package will, after 4.5 minutes, reach approximately 63% of the total 
temperature change. Accordingly, if the time duration of program segments 
exhibiting high power dissipation values is short (on the order of a few 
seconds), you can use average power, calculated in the same manner as aver¬ 
age current (as described in section 12.4.4 on page 12-20). 

Otherwise, you should calculate maximum device temperature on the basis of 
the actual time duration of the program segments involved. For example, if a 
particular program segment lasts for seven minutes, you can calculate that a 
device will reach approximately 80% of the temperature change from the total 
power dissipation during the program segment. 

You can determine average power by calculating the power for each program 
segment (including the previous considerations) and performing a time aver¬ 
age of these values, rather than simply multiplying the average current as de¬ 
termined in the previous section by Vqq. 

Specific device temperature calculations are made using the ’C30 thermal 
impedance characteristics in the TMS320C30 Digital Signal Processor data 
sheet. 
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12.5 Example Supply Current Calculations 

A fast Fourier transform (FFT) is atypical DSP algorithm. The FFT code in the 
example calculation processes data in the RAM blocks and writes the result 
out to zero-wait-state external SRAM on the primary bus. The program 
executes out of zero-wait-state external SRAM on the primary bus, and 
enables the ’C30’s cache. The entire algorithm consists mainly of internal bus 
operations and includes quiescent current and internal operations. At the end 
of processing, the 1024 results are written to the primary bus. Therefore, the 
algorithm exhibits a higher current requirement during the write portion, where 
the external bus is used significantly. 


12.5.1 Processing 


The processing portion of the algorithm is 95% of the FFT execution. During 
this portion, the power supply current is required only for the internal circuitry. 
Data is processed in several loops. During these loops, two operands are 
transferred on every cycle. The current required for internal bus operations is 
55 mA, (see section 12.2.2 on page 12-5). The data is assumed to be ran¬ 
dom. A data value scale factor of 0.8 is used from Figure 12-3 on page 12-7. 
This value scales 55 mA, yielding 44 mA for internal bus operations. Adding 
44 mA to the quiescent current requirement and internal operations current 
requirement yields a current requirement of 209 mA for the major portion of the 
algorithm. 

I = Iq + ^iops + ^ibus 

I = 110 mA + 55 mA (55mA)(0.8) = 209 mA 
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12.5.2 Data Output 


The portion of the FFT corresponding to writing out data is approximately 5% 
of the total processing time. Again, the data being written is assumed to be ran¬ 
dom. From Figure 12-3 on page 12-7 and Figure 12-8 on page 12-15, scale 
factors of 0.80 and 0.85 are used for derating from data value dependency for 
internal and primary buses, respectively. During the data dump portion of the 
code, a load and store are performed every cycle. The parallel load/store 
instruction is in an RPTS loop, so there is no contribution from internal opera¬ 
tions because the instruction is fetched only once. The only internal contribu¬ 
tions are from quiescent current and internal bus operations. Figure 12-5 on 
page 12-12 indicates a 170-mA current contribution from back-to-back zero- 
wait-state writes, and Figure 12-7 on page 12-14 indicates a -80-mA con¬ 
tribution when the expansion bus is idle (that is, with more than 18 HI cycles 
between writes). The total contribution from this portion of the code is: 

I = Iq + ^ibus + ^xbus 


or 

I = 110 -H (55 mA)(0.8) -n 60 mA - 80 mA + (170 mA)(0.85) = 278.5 mA 

12.5.3 Average Current 

The average current is derived from the two portions of the FFT. The proces¬ 
sing portion takes 95% of the time and requires about 210 mA, and the data 
dump portion takes the other 5% and requires about 280 mA. The average is 
calculated as: 

lavg = (0.95)(210 mA) -h (0.05)(280 mA) = 213.5 mA 

From the thermal characteristics specified in the ’C30 data sheet, it can be 
shown that this current level corresponds to a case temperature of 43°C. This 
temperature meets the maximum device specification of 85°C and, hence, 
requires no forced air cooling. 
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12.5.4 Experimental Results 

A photograph of the power supply current for the FFT is shown Figure 12-14. 
During the FFT processing, the measured current varies between 180 and 
220 mA. The peak of the current during external writes is 270 mA, and the 
average current requirement, as measured on a digital multimeter, is 200 mA. 
The calculations yield results that are extremely close to the actual measured 
power supply current. 

Figure 12-14. Photo of Iqq for FFT 



500 |xs/div 

Note: Input clock frequency = 33 MHz, voltage level = 5.0 Vqd 
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Appendix A 


TMS320C32 Boot Table Examples 


The ’C32 boot loader loads programs received from standard memory devices 
or through the serial port. These programs have a particular data stream struc¬ 
ture called a boot table. This appendix shows examples of different ’C32 boot 
tables in 32-, 16-, and 8-bit-wide ROM that are transmitted through the serial 
port. 

Figure A-1 through Figure A-4 show four instances of the boot table, each 
containing four blocks. The destination for the first and third block of each boot 
table is 16-bit STRBO memory. The second block is booted to the 32-bit 
lOSTRB memory. Block 4 is destined for the 8-bit memory in the STRB1 por¬ 
tion of the memory map. 

Each figure represents a boot from a different source medium. In Figure A-1, 
the boot table resides in the 32-bit lOSTRB memory. It is pointed to by the INT1 
pin low after reset In the microcontroller/boot-loader mode. The boot table In 
Figure A-2 is stored in the 16-bit STRBO memory (pointed to by INTO). The 
boot table in Figure A-3 resides in the 8-bit STRB1 memory (pointed to by 
INT2). The final example, shown in Figure A—4, represents the boot table 
stored in the host memory before being sent to the ’C32 over the serial port. 
Unlike the boot from memory, the serial port boot table omits the memory width 
control word from the beginning of the table. 

The shaded areas of the boot table examples represent the contents of the In¬ 
dividual blocks of code or data. The unshaded portions are the control words 
that instruct the boot loader program to transfer the blocks to the memory map. 
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Figure A-1. Boot From a 32-Bit-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 
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TMS320C32 Boot Table Examples 


Figure A-2. Boot From a 16-Bit-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 


Source 

Boot 

address 

table 


001 014 
001 015 
001 016 
001 017 
001 018 
001 019 


Destination 

1 

1 

Block 

address 

1 

L 

data 


001 000 

10 



001 001 

00 



001 002 

00F8 



001 003 

1000 



001 004 

10F8 



001 005 

2005 



001 006 

10F8 



001 007 

3000 



001 008 

6 



001 009 

0 



001 OOA 

1400 



001 OOB 

0000 



001 OOC 

F864 



001 OOD 

0510 

Block 1 


001 OOE 

AAll 

001 400 

AAll 

001 OOF 

AA22 

001 401 

AA22 

001 010 

AA33 

001 402 

AA33 

001 011 

AA4 4 

001 403 

AA4 4 

001 012 

AA55 

001 404 

AA55 

001 013 

AA6 6 

001 405 

AA6 6 


4 

0 

0400 

0081 

F860 

0000 


Block 2 


001 OlA 

DDll 

810 400 

BBCC DDll 

001 OIB 

BBCC 

810 401 

BBCC DD22 

001 OIC 

DD22 

810 402 

BBCC DD33 

001 OID 

BBCC 

810 403 

BBCC DD44 

001 OlE 

DD33 



001 OIF 

BBCC 



001 020 

DD44 



001 021 

BBCC 




Source 

address 

Boot 

table 

001 022 

6 

001 023 

0 

001 024 

0400 

001 025 

0088 

001 026 

F864 

001 027 

0510 

001 028 

EEll 

001 029 

EE22 

001 02A 

EE33 

001 02B 

EE44 

001 02C 

EE55 

001 02D 

EE66 

001 02E 

8 

001 02F 

0 

001 030 

0400 

001 031 

0090 

001 032 

F868 

001 033 

0010 

001 034 

OOFl 

001 035 

00F2 

001 036 

00F2 

001 037 

00F4 

001 038 

00F5 

001 039 

00F6 

001 03A 

00F7 

001 03B 

00F8 

001 03C 

0 

001 03D 

0 


Destination 

Block 

address 

data 


Block 3 


Block 4 
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A-4 TMS320C32 Boot Table Examples 


Figure A-3. Boot From a Byte-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 


Source 

Boot 

address 

table 


Destination 

Block 

address 

data 


900 

000 

08 



1 

1 


900 

028 

4 

900 

001 

00 



1 


900 

029 

0 

900 

002 

00 



1 


900 

02A 

0 

900 

003 

00 



1 


900 

02B 

0 

900 

004 

F8 



1 


900 

02C 

00 

900 

005 

00 



1 


900 

02D 

04 

900 

006 

00 



1 

1 


900 

02E 

81 

900 

007 

10 



1 


900 

02F 

00 

900 

008 

F8 



1 


900 

030 

60 

900 

009 

10 



1 


900 

031 

F8 

900 

OOA 

05 



1 


900 

032 

00 

900 

OOB 

20 



1 


900 

033 

00 

900 

OOC 

F8 



1 

1 


900 

034 

11 

900 

OOD 

10 



1 


900 

035 

DD 

900 

OOE 

00 



1 


900 

036 

CC 

900 

OOF 

30 



1 


900 

037 

BB 

900 

010 

6 



1 


900 

038 

22 

900 

Oil 

0 



1 


900 

039 

DD 

900 

012 

0 



1 

1 


900 

03A 

CC 

900 

013 

0 



1 


900 

03B 

BB 

900 

014 

00 



1 


900 

03C 

33 

900 

015 

14 



1 


900 

03D 

DD 

900 

016 

00 



1 


900 

03E 

CC 

900 

017 

00 



1 


900 

03F 

BB 

900 

018 

64 



1 

1 


900 

040 

44 

900 

019 

F8 



1 


900 

041 

DD 

900 

OlA 

10 



1 


900 

042 

CC 

900 

OIB 

05 

Block 1 




900 

043 

BB 

900 

OIC 

11 

001 

400 


AAll 

900 

044 

6 

900 

OID 

AA 

001 

401 


AA22 

900 

045 

0 

900 

OlE 

22 

001 

402 


AA33 

900 

046 

0 

900 

OIF 

AA 

001 

403 


AA4 4 

900 

047 

0 

900 

020 

33 

001 

404 


AA55 

900 

048 

00 

900 

021 

AA 

001 

405 


AA66 

900 

049 

01 

900 

022 

44 





900 

04A 

88 

900 

023 

AA 





900 

04B 

00 

900 

024 

55 





900 

04C 

64 

900 

025 

AA 





900 

04D 

F8 

900 

026 

66 





900 

04E 

10 

900 

027 

AA 





900 

04F 

05 


Source 

Boot 

address 

table 



Block 2 


810 400 
810 401 
810 402 
810 403 


BBCC DDll 
BBCC DD22 
BBCC DD33 
BBCC DD44 


Source 

Boot 

Destination 

Block 

address 

table 

address 

data 




Block 3 

900 050 

11 

880 400 

AAll 

900 051 

EE 

880 401 

AA22 

900 052 

22 

880 402 

AA33 

900 053 

EE 

880 403 

AA44 

900 054 

33 

880 404 

AA55 

900 055 

EE 

880 405 

AA6 6 

900 056 

44 



900 057 

EE 



900 058 

55 



900 059 

EE 



900 05A 

66 



900 05B 

EE 



900 05C 

8 



900 05D 

0 



900 05E 

0 



900 05F 

0 



900 050 

00 



900 051 

04 



900 052 

90 



900 053 

00 



900 054 

68 



900 055 

F8 



900 056 

10 



900 057 

00 

Block 4 


900 058 

FI 

900 400 

FI 

900 059 

F2 

900 401 

F2 

900 05A 

F3 

900 402 

F3 

900 05B 

F4 

900 403 

F4 

900 05C 

F5 

900 404 

F5 

900 05D 

F6 

900 405 

F6 

900 05E 

F7 

900 406 

F7 

900 05F 

F8 

900 407 

F8 

900 050 

0 



900 051 

0 



900 052 

0 



900 053 

0 
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TMS320C32 Boot Table Examples 


Figure A-4. Boot From Serial Port to 8-, 16-, and 32-Bit-Wide RAM 


Source 

Boot 


Destination 

Block 

address 

table 


address 

data 

808 

04C 

1000 00F8 




808 

04C 

2005 10F8 




808 

04C 

3000 10F8 




808 

04C 

6 




808 

04C 

0000 1400 




808 

04C 

0510 F864 

Block 1 




808 

04C 

0000 BBID 

001 

400 

BBlD 

808 

04C 

0000 BB2D 

001 

401 

BB2D 

808 

04C 

0000 BB3D 

001 

402 

BB3D 

808 

04C 

0000 BB4D 

001 

403 

BB4D 

808 

04C 

0000 BB5D 

001 

404 

BB5D 

808 

04C 

0000 BB6D 

001 

405 

BB6D 

808 

04C 

4 


1 


808 

04C 

0081 0400 


1 


808 

04C 

0000 F860 

Block 2 


1 


808 

04C 

DDCC BBIE 

810 

400 

DDCC BBIE 

808 

04C 

DDCC BB2E 

810 

401 

DDCC BB2E 

808 

04C 

DDCC BB3E 

810 

402 

DDCC BB3E 

808 

04C 

DDCC BB4E 

810 

403 

DDCC BB4E 

808 

04C 

6 




808 

04C 

0088 0400 




808 

04C 

0510 F864 

Block 3 




808 

04C 

0000 BBIF 

880 

400 

BBID 

808 

04C 

0000 BB2F 

880 

401 

BB2D 

808 

04C 

0000 BB3F 

880 

402 

BB3D 

808 

04C 

0000 BB4F 

880 

403 

BB4D 

808 

04C 

0000 BB5F 

880 

404 

BB5D 

808 

04C 

0000 BB6F 

880 

405 

BB6D 

808 

04C 

8 




808 

04C 

0090 0400 




808 

04C 

0010 F868 

Block 4 




808 

04C 

0000 0010 

900 

400 

10 

808 

04C 

0000 0020 

900 

401 

20 

808 

04C 

0000 0030 

900 

402 

30 

808 

04C 

0000 0040 

900 

403 

40 

808 

04C 

0000 0050 

900 

404 

50 

808 

04C 

0000 0060 

900 

405 

60 

808 

04C 

0000 0070 

900 

406 

70 

808 

04C 

0000 0080 

900 

407 

80 

808 

04C 

0000 0000 
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Appendix B 


TMS320C32 Boot Loader Operations 


This appendix contains the source code and boot loader opcodes for the ’C32. 
It also describes the on-chip boot loader program that initializes the DSP sys¬ 
tem following power up or reset. 


Topic Page 

B.1 TMS320C32 Boot Loader Source Code Description .B-2 

B.2 TMS320C32 Boot Loader Opcodes .B-4 

B.3 Boot Loader Source Code Listing .B-6 
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TMS320C32 Boot Loader Source Code Description 


B.1 TMS320C32 Boot Loader Source Code Description 

Figure B-1 shows the boot loader program flowchart. The shaded areas re¬ 
present portions of code; the square shapes depict registers containing data. 
The boot loader reads the boot table from one of three memory locations 
(1 OOOh, 81 OOOOh, 900000h) or from the serial port. The boot loader processes 
each block of the boot table separately. First, the words of the program or data 
are assembled from bytes (or half-words). The assembled words are then writ¬ 
ten to their destinations one at a time. Each block can be transferred to any 
memory address range within the memory map. The blocks in the boot table 
are preceded by three control words: block size, destination address, and 
strobe control register value. The boot loader ends execution when it finds a 
0 for the size of the next block. At that point, it initializes the three strobe control 
registers and branches to the first instruction of the first block. For that reason, 
the first boot table block always contains program information and not data. 
For information about the boot loader operation, see section B.3, Boot Loader 
Source Code Listing, on page B-6 and the TMS320C3x User’s Guide. 
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TMS320C32 Boot Loader Source Code Description 


Figure B-1. TMS320C32 Boot Loader Program Flowchart 


Boot loader execution entry (caused by 
MCBL/MP high after reset 



t Handshake mode is enabled by setting the lOXFO bit of lOF register to 1 when INT3 and any of INT2, INT1, or INTO signals 
are asserted following reset. 

Note: Shaded boxes indicate operations; white boxes indicate registers. 


TMS320C32 Boot Loader Program 
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TMS320C32 Boot Loader Opcodes 


B.2 TMS320C32 Boot Loader Opcodes 

Table B-1 lists the’C32 boot loader opcodes (shown in boldface type). Inmost 
cases, an opcode is the first byte of the machine code that describes the type 
of operation and combination of operands interpreted by the central proces¬ 
sing unit (CPU). 
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Table B-1 . TMS320C32 Boot Loader Opcodes 


ADDRESS 

OPCODE 

ADDRESS 

OPCODE 

ADDRESS 

OPCODE 

ADDRESS 

OPCODE 

00000000 

00000045 

00000034 

00000000 

00000068 

la660001 

0000009d 

086800a7 

00000001 

00000000 

00000035 

00000000 

00000069 

6a060004 

0000009e 

08650000 

00000002 

00000000 

00000036 

00000000 

0000006a 

09e6ffff 

0000009f 

08620000 

00000003 

00000000 

00000037 

00000000 

0000006b 

09eeffff 

OOOOOOaO 

080a000f 

00000004 

00000000 

00000038 

00000000 

0000006c 

09e50001 

OOOOOOal 

08600111 

00000005 

00000000 

00000039 

00000000 

0000006d 

6a00fffa 

000000a2 

15400743 

00000006 

00000000 

0000003A 

00000000 

0000006e 

186e0002 

000000a3 

08670a30 

00000007 

00000000 

0000003B 

00000000 

0000006f 

04ee0000 

000000a4 

09e70010 

00000008 

00000000 

0000003C 

00000000 

00000070 

6a070002 

000000a5 

15470740 

00000009 

00000000 

0000003D 

00000000 

00000071 

72000053 

000000a6 

6a00ffcc 

OOOOOOOA 

00000000 

0000003E 

00000000 

00000072 

6f80fffe 

000000a7 

la770020 

OOOOOOOB 

00000000 

0000003F 

00000000 

00000073 

70000008 

000000a8 

6a05fffe 

oooooooc 

00000000 

00000040 

00000000 

00000074 

15410704 

000000a9 

02f70fdf 

OOOOOOOD 

00000000 

00000041 

00000000 

00000075 

70000008 

OOOOOOaa 

0841074c 

OOOOOOOE 

00000000 

00000042 

00000000 

00000076 

15410706 

OOOOOOab 

78800000 

OOOOOOOF 

00000000 

00000043 

00000000 

00000077 

70000008 

OOOOOOac 

08630003 

00000010 

00000000 

00000044 

00000000 

00000078 

15410708 

OOOOOOad 

08730001 

00000011 

00000000 

00000045 

086f4040 

00000079 

70000008 

OOOOOOae 

09930005 

00000012 

00000000 

00000046 

09ef0009 

0000007a 

08010001 

OOOOOOaf 

18730001 

00000013 

00000000 

00000047 

08740023 

0000007b 

6a060007 

OOOOOObO 

080e0003 

00000014 

00000000 

00000048 

1014000f 

0000007c 

08400704 

OOOOOObl 

026e0001 

00000015 

00000000 

00000049 

0871ffff 

0000007d 

15400760 

000000b2 

09ee0003 

00000016 

00000000 

0000004a 

08000017 

0000007e 

08400706 

000000b3 

08000005 

00000017 

00000000 

0000004b 

02e0000f 

0000007f 

15400764 

000000b4 

04e00001 

00000018 

00000000 

0000004c 

04e00008 

00000080 

08400708 

000000b5 

6a050003 

00000019 

00000000 

0000004d 

6a05004f 

00000081 

15400768 

000000b6 

09e0ffff 

OOOOOOIA 

00000000 

0000004e 

080a000f 

00000082 

68000012 

000000b7 

09eeffff 

OOOOOOIB 

00000000 

0000004f 

026a0060 

00000083 

081b0001 

000000b8 

6a00fffb 

OOOOOOIC 

00000000 

00000050 

la600004 

00000084 

187b0001 

000000b9 

186e0001 

OOOOOOID 

00000000 

00000051 

536b4080 

00000085 

70000008 

OOOOOOba 

08600000 

OOOOOOIE 

00000000 

00000052 

6a060008 

00000086 

080d0001 

OOOOOObb 

08610000 

OOOOOOIF 

00000000 

00000053 

026a0004 

00000087 

4fl00000 

OOOOOObc 

02740003 

00000020 

00000000 

00000054 

laeooool 

00000088 

5312000d 

OOOOOObd 

72000007 

00000021 

00000000 

00000055 

536b0008 

00000089 

53710000 

OOOOOObe 

18740003 

00000022 

00000000 

00000056 

6a060004 

0000008a 

70000008 

OOOOOObf 

21871306 

00000023 

00000000 

00000057 

026a0004 

0000008b 

08040001 

OOOOOOcO 

09870000 

00000024 

00000000 

00000058 

la600004 

0000008c 

02el006c 

OOOOOOcl 

10010007 

00000025 

00000000 

00000059 

536b4800 

0000008d 

258c010f 

000000c2 

02000005 

00000026 

00000000 

0000005a 

6a05ffef 

0000008e 

09e4fff8 

000000c3 

6f80fff8 

00000027 

00000000 

0000005b 

la600008 

0000008f 

08030004 

000000c4 

78800000 

00000028 

00000000 

0000005c 

6a050002 

00000090 

09e3fff0 

000000c5 

la780002 

00000029 

00000000 

0000005d 

la780080 

00000091 

02e30003 

000000c6 

1542c200 

0000002A 

00000000 

0000005e 

08780006 

00000092 

la61000c 

OOOOOOcl 

6a060002 

0000002B 

00000000 

0000005f 

0862000f 

00000093 

52e30003 

OOOOOOcS 

08462301 

0000002C 

00000000 

00000060 

09e20010 

00000094 

04e50000 

OOOOOOcO 

78800000 

0000002D 

00000000 

00000061 

1042c200 

00000095 

52e900a7 

OOOOOOca 

Ib40c700 

0000002E 

00000000 

00000062 

1542c200 

00000096 

536900ad 

OOOOOOcb 

la780080 

0000002F 

00000000 

00000063 

09eb0009 

00000097 

6400009b 

OOOOOOcc 

6a06fffd 

00000030 

00000000 

00000064 

086800ac 

00000098 

70000009 

OOOOOOcd 

08462301 

00000031 

00000000 

00000065 

08650001 

00000099 

1544C400 

OOOOOOce 

08780002 

00000032 

00000000 

00000066 

086e0020 

0000009a 

0c800000 

OOOOOOcf 

la780080 

00000033 

00000000 

00000067 

7200005d 

0000009b 

15412501 

OOOOOOdO 

6a05fffe 





0000009c 

6a00ffdc 

OOOOOOdl 

08780006 







000000d2 

78800000 
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B.3 Boot Loader Source Code Listing 

^ C32BOOT - TMS320C32 BOOT LOADER PROGRAM (143 words) March-96 

^ (C) COPYRIGHT TEXAS INSTRUMENTS INCORPORATED, 1994 v.27 

■*■===================================================================-^^ 


■k 

^ NOTE: 


^ 1. Following device reset, the program waits for an external 

^ interrupt. The interrupt type determines the initial address 

^ from which the boot loader starts loading the boot table to the 

^ destination memory: 


INTERRUPT PIN 

BOOT TABLE START ADDRESS 

BOOT SOURCE 

INTRO 

lOOOh (STRBO 

P_PORT 

INTRl 

SlOOOOh (lOSTRB) 

P_PORT 

INTR2 

900000h (STRBl) 

P_PORT 

INTR3 

80804Ch (sportO Rx) 

SERIAL 

INTRO and INT3 

lOOOh (STRBO) ASYNC 

PPORT,XFO/XF1 

INTRl and INT3 

SlOOOOh (lOSTRB) ASYNC 

PPORT,XFO/XF1 

INTR2 and INT3 

900000h (STRBl) ASYNC 

PPORT,XFO/XF1 


^ If INT3 is asserted together with INT2, or INTI, or INTO following 

^ reset, that indicates that the boot table is to be read 

^ asynchronously from EPROM using pins XFO and XFl for handshaking. 

^ The handshaking protocol assumes that the data ready signal 

^ generated by the host arrives through pin XFl. The data 

^ acknowledge signal is output from the C32 on pin XFO. Both 

^ signals are active low. The C32 continuously toggles the TACK 

^ signal while waiting for the host to assert data ready signal 

^ (pin XFl). 

^ 2. The boot operation involves transfer of one or more source 

blocks from the boot media to the destination memory. The block 
^ structure of the boot table serves the purpose of distributing 

the source data/program among different memory spaces. Each 
^ block is preceded by several 32-bit control words describing 

the block contents to the boot loader program. 


^ 3. When loading from the serial port, the boot loader reads the source 

data/program and writes it to the destination memory. There is 
^ only one way to read the serial port. When loading from EPROM, 

however, there are 4 ways to read and assemble the 
^ source contents, depending on the width of boot memory and the 
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size of the program/data being transferred. Because there is a 
possibility that reads and writes can span the same STRB space, 
the boot loader loads the appropriate STRB control registers 
before each read and write. 

4. If the boot source is an EPROM whose physical width is less than 
32 bits, the physical interface of the EPROM device(s) to the 
processor must be the same as that of the 32-bit interface. 

(This involves a specific connection to the 032's strobe and 
address signals). The reason for such an arrangement is that 

to function properly, the boot loader program always expects 
32-bit data from 32-bit wide memory during the boot load 
operation. Valid boot EPROM widths are : 1, 2, 4, 8, 16 

and 32 bits. 

5. A single source block cannot cross STRB boundaries. For 
example, its destination cannot overlap STRBO space and lOSTRB 
space. Additionally, all of the destination addresses of a 
single source block must reside in physical memory of the 
same width. It is not permitted to mix program and data in the 
same source block. 

6. The boot loader stops boot operation when it finds a 0 in the 
block size control word. Therefore, each boot table must 

end with a 0, prompting the boot loader to branch to the 
first address of the first block and start program execution 
from that location. 

==================================================================■*■ 

'C32 boot loader program register assignments, and altered memory 
locations 

==================================================================-^^ 


AR7 - peripheral memory map TOE - XFO (handshake - data acknowledge) 
ARO - read cntrl data subr pointer TOE - XFl (handshake - data ready) 
ARl - read block data/prg subr pointer 

R2 - read STRB value R4 - write STRB value 

AR2 - read STRB pointer AR4 - write STRB pointer 

AR3 - read data/prg pointer AR5 - write data/prg pointer 

read —> R1 —> write 


IRQ 

- EXEC 

start 

flag 

stack 

- 808024h - 

TIMO 

cnt 

reg 

IRl 

- EXEC 

start 

address 


808028h - 

TIMO 

per 

reg 





lOSTRB 

- 808004h - 

DMAO 

dst 

reg 

R3 

- data 

size 


STRBO 

- 808006h - 

DMAO 

dst 

reg 

R5 

- mem ■ 

width 


STRBl 

- 808008h - 

DMAO 

cnt 

reg 
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■k 

^ R6 - memory read value AR6,R7,R0,BK - scratch registers 

■*■===================================================================■*■ 


reset .word start 

.space 44h 


; reset vector 
; program starts @45h 


■k 


k 


^ Initialize registers : 808000h —> AR7, 808023h —> SP, -1 —> IRQ 
■:^===================================================================-^^ 


start LDI 4040h,AR7 ; load peripheral memory map 

LSH 9;,AR7 ; base address = 808000h 

LDI 23h,SP ; initialize stack pointer to 

OR AR7,SP ; 808023h (timer counter - 1) 

LDI -l^IRO ; reset exec start addr flag 

■*■===================================================================■*■ 


^ Test for INT3 and, if set exclusively, proceed with serial 
^ boot load. Else, load AR3 with lOOOh if INTO, 810000h if INTI, 

^ 900000h if INT2. Also load the appropriate boot strobe pointer —> AR2 
^ and force the boot strobe value to reflect 32-bit memory width. 


If 

(INTO or INTI 

or INT2) 

and INT3, 

turn on the handshake mode . 

^ait 1 

LDI 

H 1 

1 

1 

o 1 




AND 

OFh,RO 

} 

clean 


CMP I 

8, RO 

} 

test for INT3 


BEQ 

serial 

. -k -k -k -k k k k . 

r r 

serial boot load mode 


LDI 

AR7,AR2 




ADD I 

60h,AR2 

} 

808060h (lOSTRB) —> AR2 


TSTB 

2,R0 

r 

test for INTI 


LDINZ 

4080h,AR3 

} 

8I0000h / 2**9 


BNZ 

exits 

. k k k k k k k . 

f r 



ADD I 

4, AR2 

} 

808064h (STRBO) —> AR2 


TSTB 

1,R0 

r 

test for INTO 


LDINZ 

8, AR3 

r 

OOlOOOh / 2**9 


BNZ 

exits 

. k k k k k k k . 

r r 



ADD I 

4, AR2 

r 

808068h (STRBl) —> AR2 


TSTB 

4, RO 

} 

test for INT2 


LDINZ 

4800h,AR3 

} 

900000h / 2**9 


BZ 

waitl 

. k k k k k k k . 

f f 
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exits 


exits 


TSTB 

8, RO 

test#l 

- INT3 asserted 


BZ 

exit2 

; ^ ; t e s t # 2 

- INXFl low (not 

use 

TSTB 

80h,lOF 

; ; enable 

handshake mode if 

LDI 

6, lOF 

test#l 

passed 


LDI 

OFh,R2 




LSH 

16, R2 

; force 

boot data size to 

32 

OR 

*AR2,R2 

; force 

boot mem width to 

32 

STI 

R2,*AR2 




LSH 

9, AR3 

; boot mem start addr —> 

AR3 


^ xxOOOOOl - 1 bit 

^================================================== xxOOOOlO - 2 bit 

^ Process MEMORY WIDTH control word (32 bits long) xxOOOlOO - 4 bit 

^================================================== xxOOlOOO - 8 bit 

^ xxOlOOOO - 16 bit 

^ xxlOOOOO - 32 bit 



LDI 

read_mc,ARO ; 

use memory to read cntrl words 



} 

read_ 

_mc —> ARO 


LDI 

1,R5 

mem width = 1 

(init) 


LDI 

32,AR6 ; 

mem reads = 32 

(init) 


CALLU 

read_m ; 

read memory once 

(1st read) 

loop2 

TSTB 

1, R6 




BNZ 

label4 




LSH 

-1,R6 

look at next bit 



LSH 

-1,AR6 ; 

deer mem reads 



LSH 

1,R5 

incr mem width 

—> R5 


BU 

100p2 ;*******; 



label4 

SUBI 

2, AR6 




CMP I 

0,AR6 

set flags 



BN 

strobes ■*******■ 

total # of mem reads = 32/R5 

labels 

CALLU 

read_m ; 

read memory once 



DBU 

AR6,labels ;****; 



■k 




■k 

Read 

and save 

lOSTRB, STRBO & STRBl 

(to be loaded at 

end of 

^ boot 

load) 




■k 




■k 

strobes 

CALLU 

ARO 




STI 

Rl, *+AR7(4) 

; lOSTRB —> 

(DMA sre) 


CALLU 

ARO 




STI 

Rl, *+AR7 (6) 

; STRBO —> 

(DMA dst) 


CALLU 

ARO 




STI 

Rl, *+AR7(8) 

; STRBl —> 

(DMA ent) 


■k 


■k 
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^ cntrl) 

■k= = = = = = = 


block 


label2 


s block 

size (# of bytes. 

half- 

-words, or words 

after STRB 

CALLU 

ARO 

} 

read boot memory 

cntrl word 

LDi 

R1,R1 

r 

is this the last 

block ? 

BNZ 

labels 

no, go around 


LDi 

*+AR7(4),RO 

r 


(DMA src) 

STi 

RO,*+AR7(60h) 

r 

restore iOSTRB 


LDi 

*+AR7(6),RO 

r 


(DMA dst) 

STi 

RO,*+AR7(64h) 

r 

restore STRBO 


LDi 

*+AR7(8),R0 

r 


(DMA cnt) 

STi 

RO,*+AR7(68h) 

} 

restore STRBl 


BU 


branch to start 

of program 

LDi 

Rl, RC 

} 

setup transfer loop 

SUBi 

1, RC 

} 

RC - 1 —> RC 



■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =-^^ 
^ Process block destination address, save start address of first 
^ block 

■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =-^^ 


CALLU 

LDi 

CMPi 

LDINZ 

LDINZ 


ARO 

Rl,AR5 
0, IRO 
AR5,IRl 
0, IRO 


; read boot memory cntrl word 
; set dest addr —> AR5 

; look at EXEC start addr flag 
; if -1, EXEC start addr —> IRl 
; set EXEC start addr flag 


■:^ ===================================================================-^^ 

(For internal destination, this word must be 0 or 60h. The first 
case results in 0 —> DMA control register, in second case 0 —> 

^ iOSTRB register). 

^ Process block destination strobe control (sss...sss 0110 xxOO) 




- strb value - 

00 - 

- IOSTRB 




01 ■ 

- STRBO 

CALLU 

ARO 

r 

10 ■ 

- STRBl 

LDi 

R1,R4 




AND 

6Ch,Rl 

; dest mem strb pntr 

— > 

AR4 

OR3 

AR7,R1,AR4 




LSH 

-8, R4 

; dest memory strobe 

— > 

R4 

LDi 

R4,R3 




LSH 

-16,R3 
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AND 3,R3 ; dest data size —> R3 

TSTB OCh,Rl ; (lOSTRB case) 

LDIZ 3,R3 

■*■= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =■*■ 

^ Look at R5 and choose serial or memory read for block data/program 
■:^ =================================================================== 


CMPI 0,R5 

LDIEQ read_sO,ARl ; read serial portO 

LDINE read_mb,ARl ; read memory 

■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =■*■ 

^ Transfer one block of data or program 

■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =-^^ 


RPTB loop4 

CALLU ARl ; read data/prg 

STI R4,"^AR4 ; set write strobe 

NOP ; pipeline 

loop4 STI Rl,^AR5++ ; write data/prg!!!!!!!!!! 

BU block .-k-k-k-k-k-k-k . process next block 

■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =-^^ 

^ Load R5 with 0, load read_s0 to ARO and initialize serial port_0 
■*■===================================================================■*■ 


serial LDI read_s0,AR0 ; use serial to read cntrl words 

LDI 0,R5 ; memory WIDTH = serial 

LDI 0,R ; dummy 

LDI AR7,AR2 ; dummy 

LDI lllh.RO ; OOOOlllh —> RO 

STI RO,^+AR7(43h) ; set CLKR,DR,FSR as serial 

LDI 0A30h,R7 ; port pins 

LSH 16,R7 ; A300000h —> R7 

STI R7,^+AR7(4Oh) ; set serial global cntrl reg 

BU strobes process first block 

===================================================================■*■ 

^ Read a single value from serial or boot memory. The number of 
^ memory reads depends on memory width and data size. R1 returns the 
^ read value. (Serial sim: NOP —> BZ read_s0 & LDI @4000H,R1 —> LDI 
^ ^+AR7(4Ch),R1) 

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 


read_s0 TSTB 20h, IF ; look at RINTO flag 
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BZ 

read_sO 

; wait for receive buffer 

full 


AND 

OFDFh,IF 

; reset interrupt flag 



LDI 

*+AR7(4Ch),R1 

; read data — > 

R1 

~lr 

RETSU 




read_mc 

LDI 

3, R3 

; data size = 32, 3 — > 

R3 

read_mb 

LDI 

1, BK 

; 00000001 (ex: mem width=8) 


LSH 

R5, BK 

; 00000100 



SUBI 

1,BK 

; OOOOOOFF = mask —> 

BK 


LDI 

R3,AR6 

; 0 - 1 000 EXPAND 



ADD I 

1, AR6 

; 1 - 10 000 DATA - 

-> AR6 


LSH 

3, AR6 

; 11 - 100 000 SIZE 



LDI 

R5,R0 



loop3 

CMP I 

1,R0 




BEQ 

exitl 

; DATA SIZE 



LSH 

-1, RO 

; 1 

-> AR6 


LSH 

-1,AR6 

; MEM WIDTH 



BU 

loops 

. -k -k -k -k -k . 

r r 


exit 1 

SUBI 

1, AR6 




LDI 

O 

o 

; init shift value 



LDI 

0, R1 

; init accumulator 


loopl 

ADDI 

3, SP 

; 808027h —> SP 



CALLU 

read_m 

; read memory once - 

-> R6 


SUBI 

3, SP 

; 808024h —> SP 



AND 3 

R6,BK,R7 

; apply mask 



LSH 

RO, R7 

; shift 



OR 

R7, R1 

; accumulate —> 

R1 


ADDI 

R5,R0 

; increment shift value 



DBU 

RETSU 

AR6,loopl 

•'k'k'k'k-k- decrement #of chunks - 

-> AR6 


■:^ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ■:*r 

^ Perform a single memory read from the source boot table. 

^ Handshake enabled if lOXFO bit of lOF reg is set, disabled when 
reset. lACK will pulse continuously if handshake enabled and data 
^ not ready (to achieve zero-glue interface when connecting to a C40 
comm-port) 

===================================================================-^^ 


_m TSTB 

2, lOF 

; handshake mode enabled ? 

STI 

R2,*AR2 

; set read strobe !!!!!!!!!!!!! 

BNZ 

loop5 

; yes, jump over 

LDI 

*AR3++, R6 

; no, just read memory & return 

RETSU 





- (Q40) 
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loops 

lACK 

TSTB 

BNZ 

*AR7 

80h,lOF 
loops 

. Vr . 

r r 

. ic . 

r f 

. -jir . 

f f 

intrnl dummy read pulses lACK 
wait for data ready 
(XFl low from host) 


LDI 

*AR3++,R6 

. -jir . 

f f 

read memory once —> R6 


LDI 

2, lOF 

. k . 

r r 

. k . 

f f 

assert data acknowledge 
(XFO low to host) 

loop6 

TSTB 

BZ 

80h,lOF 
loop6 

. k . 

f f 

. k . 

wait for data not ready 
(XFl high from host) 

■k 

LDI 

RETSU 

6, lOF 

. k . 

. k . 

deassert data acknowledge 
(XFO high to host) 

k 
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Memory Access for C Programs 


This appendix describes the two memory models that can be used to access 
data when programming in C. 

Two memory models can be used to access data when programming in C. In 
the small model (default), the external bus cycles use direct addressing to ac¬ 
cess data from memory. Direct addressing uses 16 bits of address in the 
instruction opcode. The address is combined with the 8-bit data page (defined 
beforehand) to access the data from memory. The 16-bit address limits the 
number of words that the small model can access to 64K words. However, this 
mode produces fast and compact code because each data access uses only 
a single instruction (see Figure C-1). 

The big model is not limited to 64K words because each data access in C ex¬ 
plicitly sets the data page pointer (DP register). The 8-bit data page and 16-bit 
direct address are combined for a total address reach of 16M words, but at a 
price of two instructions per data access (see Figure C-1). 

Dynamically allocated memory can be used if the application needs a large 
address reach, compact code size, and fast execution. The MALLOC function 
from the runtime support library (RTS) can be called at run time to reserve a 
block of memory in the .SYSMEM section. Upon return, MALLOC returns a 
pointer to the newly allocated block. Any reference to that block of memory 
results in assembled code using indirect addressing, in which the opcode 
contains a pointer to the auxiliary register that holds the address of the operand 
(see Figure C-1). Code referring to the dynamically allocated memory is fast 
and has a 16M-word address reach (24 bits). The price is a one-time call to 
MALLOC for each dynamically allocated array. For that reason, MALLOC is 
most efficient with large data arrays where the overhead associated with the 
call is insignificant when compared to a large number of data accesses that 
use the big arrays. 
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Figure C-1. Memory Allocation in C Programs 


(a) Small model (default) 

• Static memory - assigned at compile time 

• Maximum size - 64K words 

• Fast execution 


TMS320C32 


Memory 

STRB 

-► 




.bss (small) 



.text 


C statement 

Equivalent assembly code 

C = A + B 

LDI @ OFFFDh, RO 

LDI (g)0FFFEh, R1 

ADDI RO, R1 

STI R1,(g)0FFFh 


(b) Big model (-mb option) 

• Static memory - assigned at compile time 

• Maximum size - 64M words 

• Slow execution 


TMS320C32 Memory 


STRB 

- > 

.bss (big) 



.text 


C statement 

Equivalent assembly code 

C = A + B 

LDP (5) 880001 h, DP 

LDI @ 880001 h,R0 

LDP (g)1002h, DP 

LDI @ 1002h, R1 

LDP (5) 8A0003, DP 

STI R1,(g)8A0003 


(c) RTS library (MALLOC) 

• Dynamic memory - assigned at execution time 

• Maximum size - 64M words 

• Fast execution 

• Best for big arrays (one time overhead - MALLOC call) 


TMS320C32 Memory 
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Memory Access for C Programs 


Figure C-2 shows how to use MALLOC to allocate a block of 32-bit memory 
at run time. In this example, MALLOC is called three times to allocate memory 
from the heap. 

After each MALLOC call, the newly allocated block of memory can be used by 
other program functions by using the pointer BLIFFER_32. The size of the 
heap (representing all of dynamically allocated memory) is defined in the linker 
command file by using the HEAP keyword followed by the size of the block. 
Any portion of the heap allocated with the MALLOC call is added to the 
.SYSMEM section. The SECTIONS directive can then be used to map the 
dynamically allocated sections to an address range in the physical memory. 
(For more information, see the TMS320C3x/C4x Assembly Language Tools 
User’s Guide or TMS320C3x/C4x Optimizing C Compiler User’s Guide .) 

Dynamically allocated memory provides the only method for a C program to 
access 8- or 16-bit wide memory. This means that physical memory that is less 
than 32 bits wide cannot be accessed using small or big model addressing. 
Instead, the MALLOC8 and MALLOC16 RTS library functions can allocate 
blocks of 8- and 16-bit wide memory. These routines work like the 32-bit 
MALLOC by returning pointers to 8- or 16-bit memory blocks. These can be 
used by code that follows the MALLOC call to access that memory (see 
Figure C-3 and Figure C-4). The 8-bit data allocated by MALLOC8 is placed 
in the .SYSM8 section by the linker, while the 16-bit data is deposited in the 
.SYSM16 section. HEAP8 and HEAP16 linker keywords limit the total amount 
of 8- or 16-bit memory that the C compiler can allocate into those sections. (For 
more information, see the TMS320C3x/C4x Optimizing C Compiler User’s 
Guide .) 
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Figure C-2. Dynamic Memory Allocation for TMS320C32 (One Block of 32-Bit Memory) 


(a) C code 


• 

• 

int ^BUFFER 32 

• 

• 

• 

• 

declare 

a pointer to a pool of 32-bit memory 

• 

• 

• 

BUFFER_32 

dsp func4 

• 

• 

= MALLOC (2048 ^ sizeof 
( BUFFER_32) 

(int)) 

/* allocate 2K words of memory 
/* use the above memory */ 

• 

BUFFER_32 

dsp_func5 

• 

• 

= MALLOC (512 * sizeof 
( BUFFER_32) 

(int)) 

/'^ allocate 0.5K words of memory */ 

/'^ use the above memory 

• 

BUFFER_32 

dsp_func6 

• 

• 

• 

= MALLOC (1024 * sizeof 
(BUFFER_32) 

(int)) 

/'^ allocate IK words of memory */ 
use the above memory 


(b) LINKER command file 


-heap 0x4000 


/'^ set the size of the dynamic 32-bit memory section 


STRB_RAM org = 0x1000, len = 0x8000 define physical 32-bit memory */ 


.sysmem > STRB_RAM 


/* assign logical section to physical memory */ 


TMS320C32 

’C31 

’C30 



32-bit wide 
memory 
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Figure C-3. Dynamic Memory Aiiocation for TMS320C32 (One Biock of 16-Bit Memory) 
(a) C code 


int *BUFFER_16 /* declare a pointer to a pool of 16-bit memory */ 


^0x808064 = 0x5000 STRBO control register : data size = 16, memory width = 16 */ 


BUFFER_16 = MALLOC16(1024 ^ sizeof (int)) 
dsp_func4 ( BUFFER_16) 


allocate 2K half-words of memory */ 
use the above memory 


BUFFER_16 = MALLOC16 (512 * sizeof (int)) allocate IK half-words of memory 

dsp_func5 ( BUFFER_16) /* use the above memory 


BUFFER_16 = MALLOC8 (2048 * sizeof (int)) allocate 4K half-words of memory */ 

dsp_func6 (BUFFER_16) /* use the above memory 


(b) LINKER command file 


-heap 16 


STRB0_RAM 


.sysml6 


0x4000 /* set the size of the dynamic 16-bit memory section 


org = 0x880000, len = 0x8000 define physical 16-bit memory 


> STRB0_RAM /* assign logical section to physical memory */ 


(c) ’C32 external memory contents 

TMS320C32 


STRBO 


.sysml 6 

16-bit wide memory 

lOSTRB 

.bss 

STRB1 


.text 


32-bit wide memory 
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Figure C-4. Dynamic Memory Ailocation for TMS320C32 (One Biock Each of 32-, 16-, 
and 8-Bit Memory) 


(a) C code 


• 

int ^BUFFER 32 

declare a 

pointer to a 

pool of 

32-bit memory 



int *BUFFER_16 

declare a 

pointer to a 

pool of 

16-bit memory 



int *BUFFER_08 

• 

/* declare a 

pointer to a 

pool of 

8-bit memory 



• 

^0x808064 

= 0x5000 

/■k STRBO control register 

: data size = 16, memory width = 

-■ 16 

^0x808068 

• 

= 0x0000 

STRBl control register 

: data size = 8 , memory width 

= 8 */ 

• 

BUFFER_32 

- MALLOC 

(1024 * sizeof (int)) /* 

allocate 

IK words of memory 



BUFFER_16 

- MALLOC16(1024 ^ sizeof 

(int)) /* 

allocate 

2K halfwords of memory 

■k / 

BUFFER_08 

= MALLOC8 

(1024 ^ sizeof 

(int)) /* 

allocate 

4K bytes of memory 

■k/ 


dsp_funcl 

• 

(BUFFER_32, 

BUFFER_16, BUFFER_08) /* 

use the 

above memory */ 



• 

BUFFER_32 

= MALLOC 

(2048 * sizeof (int) ) /'^ 

allocate 

2K words of memory 

■k / 


BUFFER_16 

= MALLOC16 

(512 ^ sizeof 

(int)) /* 

allocate 

IK half-words of memory 


dsp_func2 

• 

(BUFFER_32, 

BUFFER_16) 

/* 

use the 

above memory */ 



• 

BUFFER_08 

= MALLOC8 

(4096 sizeof 

(int)) /* 

allocate 

16K bytes of memory 



dsp_func3 

• 

• 

(BUFFER_08) 


/* 

use the 

above memory */ 




(b) LINKER command file 


-heap 0x4000 
-heap 16 0x4000 
-heap 8 0x4000 


/* set the size of the dynamic 32-bit memory section 
set the size of the dynamic 16-bit memory section 
/'^ set the size of the dynamic 8-bit memory section 


10STRB_RAM 

STRB0_RAM 

STRB1_RAM 


org = 0x810000, 
org = 0x880000, 
org = 0x900000, 


len = 0x8000 
len = 0x8000 
len = 0x8000 


/* define physical 32-bit memory */ 
/'^ define physical 16-bit memory */ 
/'^ define physical 8-bit memory */ 


sysmem > 
sysml6 > 
sysm8 > 


10STRB_RAM 

STRB0_RAM 

STRB1_RAM 


/'^ assign logical section to physical memory */ 
/* assign logical section to physical memory */ 
/* assign logical section to physical memory */ 


(c) ’C32 external memory contents 

TMS320C32 
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Appendix D 


Memory Interface and Address Translation 


This appendix describes how to use the ’C32’s memory interfaces to connect 
to various external devices. 

The ’C32 memory interface supports variable-width memory and variable-size 
data. The physical width of a memory bank connected to the ’C32 can be 8, 
16, or 32 bits wide. When connecting 16-bit external memory, the A_i address 
pin must be connected to the Aq pin of the memory device, causing a 1 -bit shift 
in the connection of the remaining address lines. For 8-bit memory, two extra 
address pins are used (A_-| and A_ 2 ), effectively shifting the external address 
by two bits. No external address shift is needed for connecting 32-bit wide 
memory (or boot table memory, regardless of its width). 

The ’C32 can access data of any size, regardless of the physical width of an 
external memory bank. For example, byte-wide data can be packed in 16-bit 
memory, or 32-bit data can be accessed from 8-bit wide memory. The latter 
takes four cycles. The variable-data size feature is made possible by dividing 
the STRBO or STRB1 controls into four signals each. The four control signals, 
in addition to being strobes, serve a byte-enable function. 

Figure D-1 shows examples of three ’C32 systems, each connected to a 
memory bank of a different width. 

Regardless of memory width, the data inside each bank can be 8, 16, or 32 
bits wide. Before data of a particular size can be accessed, the respective 
strobe control register must be programmed for that size. While the data size 
can vary, the program is always 32 bits wide. Even if they are different sizes, 
program and data can reside within the same physical bank of memory. 

Up to two data sizes can reside simultaneously alongside the 32-bit program 
in a single bank (see Figure D-2 on page D-3). 
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Memory Interface and Address Translation 


Figure D-1. Data and Program Packing (Program and a Singie Data Size) 


32-bit memory 



8-bit memory 



NOTE: 8-bit programs are not supported. 
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Figure D-2. Data and Program Packing (Program and Two Different Data Sizes) 


32-bit memory 



8-bit memory 



NOTE: 8-bit programs are not supported. 
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Memory Interface and Address Translation 


Since there are two strobes that support flexible memory (STRBO and 
STRB1), they each can be programmed for a different data size using the re¬ 
spective strobe control registers. By setting the strobe configuration bit in one 
control register, both STRBO and STRB1 strobes can be mapped to STRBO 
control signals. This creates a section of physical memory that is mapped into 
the same address range as another section of memory with a hardware switch 
to determine which range is active. In this overlay mode, data accesses to and 
from the STRBO and STRB1 portions of the memory map drive the STRBO sig¬ 
nals to control a single memory bank. The access to the program and to two 
different data sizes from a single memory bank with no additional logic devices 
is a powerful ’C32 feature that minimizes system cost with no performance 
penalty. See the TMS320C3x User’s Guide for more information on the ’C32 
enhanced external memory interface. 

The translation starts when an instruction requests a data read from a certain 
external address. Address locations referenced by program instructions are 
logical addresses. Before the logical address shows up on the external pins 
of the ’C32, it may undergo a 1 - or 2-bit shift to the right that depends only on 
the size of the data being accessed. The address at the pins is a physical 
address. Before it is presented at the pins of the memory device, the physical 
address may again be shifted (this time to the left) if the memory is other than 
32 bits wide. The physical-to-memory address shift is one bit for 16-bit wide 
memory and two bits for 32-bit memory. The Table D-1 and Table D-2 sum¬ 
marize the rules that apply to the variable data size and memory width for any 
’C32 system. 

Table D-1. Variable Memory Width 


Memory 

Width 

Strobes Valid 

Physical Address 
Lines Valid 

Physical Address to 
Memory Address Shift 
(bits) 

32 

STRBx B3 
STRBx_B2 
STRBx_B1 
STRBx_B0 

A23-A0 

0 

16 

STRBx_B1 

STRBx_B0 

A23-A0 

A-1 

1 

8 

STRBx_B0 

A23-A0 

A-1 

A-2 

2 
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Table D-2. Variable Data Size 


Data Size 

Logical to Physical 
Address Shift (bits) 

32 

0 

16 

1 

8 

2 


Figure D-3 through Figure D-11 show how the address changes when acces¬ 
sing data of varying size from memory that is 32,16, and 8 bits wide. The three 
data sizes and three memory widths comprise the nine cases that cover all 
possible combinations. 
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Figure D-3. Address Translation for 32-Bit Data Stored in 32-Bit-Wide Memory 


CPU instruction: STI RO, @ 7FFFh 


Memory map 


STRBO 


lOSTRB 


STRBO 


STRB1 


Logical address 
space 


w1 


w2 


w3 


w4 


W32765 


W32766 


W32767 


W32768 


Oh 

1h 

2h 

3h 


STRBO 

control 

register 


STRB 

config 


Memory Data 
width size 


• •• 

0 


1 1 

1 1 

• •• 


STRBO 


32 bits 32 bits 


Logical address (23 to 0) 


7FFCh 
7FFDh 
7FFEh 
7FFFhJ 


111111111111111 


111111111111111 


Physical address (23 to -2) 
Memory address (14 to 0) 


CO 

CM 

T_ 

00 

00 

DQ 

o' 

o' 

o' 

DQ 

DQ 

DQ 

DC 

DC 

DC 

1 — 

1 — 

1 — 

CO 

CO 

CO 


w1 


w2 


w3 


w4 


I I I I 


W32765 


W32766 


W32767 


W32768 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Logical address 
shift = 0 bits 
(32-bit data size) 



Note: 


The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
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Memory Interface and Address Translation D-7 


Figure D-4. Address Translation for 16-Bit Data Stored in 32-Bit-Wide Memory 



Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Logical address 
shift = 1 bit 
(16-bit data size) 



Note: The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
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Figure D-5. Address Translation for 8-Bit Data Stored in 32-Bit-Wide Memory 


CPU instruction: STI RO, (g) FFFh; DP = 01 


STRBO 


STRB 

config 


Memory Data 
width size 


Memory map Logical address 

control 

• •• 

0 


1 1 

0 0 

• •• 

space 

register 








STRBO 


lOSTRB 


STRBO 


STRB1 


b1 


b2 


b3 


b4 




b131069 


b131070 


B131071 


b131072 



STRBO 


Logical address (23 to 0) 


32 bits 8 bits 



Physical address (23 to -2) 
Memory address (14 to 0) 


CO 

CM 


00 

DQ 

DQ 

o' 

o' 

o' 

DQ 

DQ 

DQ 

DC 

DC 

DC 

1 — 

1 — 

1 — 

CO 

CO 

CO 




1 


o 

CQ 

o' 

DQ 

DC 

“I 

I 


b4 

b3 

b2 

b1 

b8 

b7 

b6 

b5 

b12 

b11 

bio 

b9 

b16 

b15 

b14 

b13 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

b131060 

b131059 

b131058 

b131057 

b131064 

b131063 

b131062 

b131061 

b131068 

b131067 

b131066 

b131065 

b131072 

b131071 

b131070 

b131069 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Logical address 
shift = 1 bit 
(8-bit data size) 



Note: The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
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Figure D-6. Address Translation for 32-Bit Data Stored in 16-Bit-Wide Memory 


CPU instruction: STI RO, (g) 3FFFh; DP = 88h 


Memory map Logic^^ address 

880000h 
880001h 
880002h 
880003h 


88FFFCh 

88FFFDh 

88FFFEh 

883FFFh 


STRBO 


lOSTRB 


STRBO 


STRB1 




w1 


w2 


w3 


w4 


W16381 


W16382 


W16383 


W16384 


STRBO 

control 

register 



STRB 

config 

Memory 

width 

’ Data 
size 


• •• 

0 


0 1 

1 1 

• •• 


STRBO 

16 bits 

32 bits 



Logical address (23 to 0) 



Logical address 
shift = 0 bits 
(32-bit data size) 



T— 

o 

00 

GO 

o' 

o' 

00 

00 

DC 

DC 

I— 

I— 

CO 

CO 

I 



w1 (Is) 


w1 (ms) 
w2 (Is) 
w2 (ms) 




W16383 (Is) 
W16383 (ms) 


W16384 (Is) 


W16384 (ms) 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Physical address 
shift = 1 bit 
(16-bit memory width) 


D 

CD 


Notes: 


1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-7. Address Translation for 16-Bit Data Stored in 16-Bit-Wide Memory 


CPU instruction: STI RO, (g) 7FFFh; DP = 88h 


Memory map 


STRBO 


lOSTRB 


STRBO 


STRB1 


Logical address 
space 


hw1 


hw2 


hw3 


hw4 


hw32765 


hw32766 


hw32767 


hw32768 


STRBO 

control 

register 


STRB 

config 


Memory Data 
width size 


• •• 

0 


0 1 

0 1 

• •• 


880000h 
880001h 
880002h 
880003h 


88FFFCh 

88FFFDh 

88FFFEh 

887FFFh 


STRBO 


16 bits 16 bits 


Logical address (23 to 0) 



T_ 

o 

CD 

00 

o' 

o' 

CD 

CD 

DC 

DC 

1— 

1— 

CO 

CO 


I I 


hwl 


hw2 


hw3 


hw4 




hw32765 


hw32766 


hw32767 


hw32768 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Logical address 
shift = 1 bit 
(16-bit data size) 



Physical address 
shift = 1 bit 
(16-bit memory width) 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Memory Interface and Address Translation D -11 


Figure D-8. Address Translation for 8-Bit Data Stored in 16-Bit-Wide Memory 


CPU instruction: STI RO, (g) OFFFh; DP = 90h 


Memory map 


STRBO 


lOSTRB 


STRBO 


STRB1 


Logical address 
space 


b1 


b2 


b3 


b4 


Logical address 
shift = 2 bits 
(8-bit data size) 


STRBO 

control 

register 


STRB 

config 


Memory Data 
width size 


• •• 

0 


0 1 

0 0 

• •• 


b65533 


b65534 


b65535 


b65536 


880000h 
880001h 
880002h 
880003h 


88FFFCh 

88FFFDh 

88FFFEh 

88FFFFh- 


STRBO 


16 bits 8 bits 


STRB enable 


Logical address (23 to 0) 


> 1 


1111111111111 1 


0 


Physical address (23 to - 2 ' 
Memory address (14 to 0) 





STRB0_B0 

u u u u u 



0 

L r 


1 



111111111111111 


□ 


111111111111111 


b2 

b1 

b4 

b3 

b6 

b5 

b8 

b7 

• 

• 

• 

• 

• 

• 

b65530 

b65529 

b65532 

b65531 

b65534 

b65533 

b65536 

b65535 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Physical address 
shift = 1 bit 


Notes: 


1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-9. Address Translation for 32-Bit Data Stored in 8-Bit-Wide Memory 

CPU instruction: STI RO, @ 1 FFFh; DP = 90h 


Memory map 


STRBO 


lOSTRB 


STRBO / 


STRB1 


Logical address 
space 


w1 


w2 


w3 


w4 




w8189 


w8190 


W8191 


STRB1 

control 

register 



Memory 

Data 



width 

size 


• •• 

0 0 

1 1 

• •• 


900000h 

900001h 

900002h 

900003h 


901FFCh 
901FFDh 
901FFEh 
901FFFh 


Physical address (23 to -2) 
Memory address (14 to 0) 


1111111111111 


o 

DO 


CD 

DC 



Memory address space 


Logical address 
shift = 0 bits 
(32-bit data size) 



Physical address shift = 2 bits 
(8-bit memory width) 


Notes: 


1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-10. Address Translation for 16-Bit Data Stored in 8-Bit-Wide Memory 


CPU instruction: STI RO, @ 3FFFh; DP = 90h 


Memory map 


STRBO 


lOSTRB 


STRBO 


STRB1 


Logical address 
space 


hw1 


hw2 


hw3 


hw4 




hw16381 


hw16382 


hw16383 


hw16384 


900000h 

900001h 

900002h 

900003h 


903FFCh 

903FFDh 

903FFEh 

903FFFh 


STRB1 

control 

register 



Memory 

Data 



width 

size 


• •• 

0 0 

0 1 

• •• 


Logical address (23 to 0) 



Physical address (23 to -2) 
Memory address (14 to 0) I 


11111111111111 


Toggle 


Memory address space 


Logical address 
shift = 1 bit 
(16-bit data size) 



Physical address shift = 2 bits 
(8-bit memory width) 


Notes: 


1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-11. Address Translation for 8-Bit Data Stored in 8-Bit-Wide Memory 

CPU instruction: STI RO, @ 7FFFh; DP = 90h 


Memory map 


STRBO 


lOSTRB 


STRBO 


STRB1 


Logical address 
space 


/ b1 


b2 


b3 


b4 


b32765 


b32766 


b32767 


900000h 
900001h 
900002h 
900003h 


907FFCh 

907FFDh 

907FFEh 

907FFFh 


STRB1 

control 

register 



Memory 

Data 



width 

size 


• •• 

0 0 

0 0 

• •• 


Logical address (23 to 0) 


> 1 


0 


8 bits 8 bits 


1111111111111 11 




111111111111111 


Physical address (23 to -2) 
Memory address (14 to 0) I 


111111111111111 


DQ 

DC 

I 


b1 


b2 


b3 


b4 


b32765 


b32766 


b32767 


Oh 

1h 

2h 

3h 


7FFCh 

7FFDh 

7FFEh 

7FFFh 


Memory address space 


Logical address 
shift = 2 bits 
(8-bit data size) 



Physical address shift = 2 bits 
(8-bit memory width) 


Notes: 


1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 

2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 


Memory Interface and Address Translation 
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